访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问
cs.LG 方向,今日共计129篇
Graph相关(图学习|图神经网络|图优化等)(10篇)
【1】 Learning latent causal graphs via mixture oracles 标题:基于混合预言的潜在因果图学习
作者:Bohdan Kivva,Goutham Rajendran,Pradeep Ravikumar,Bryon Aragam 机构:University of Chicago, Carnegie Mellon University 备注:37 pages 链接:https://arxiv.org/abs/2106.15563 摘要:我们研究了在潜在变量存在的情况下,从数据中重建因果图模型的问题。主要的问题是恢复因果结构的潜在变量,同时考虑到一般,潜在的非线性变量之间的依赖性。在许多实际问题中,原始观测值(例如图像中的像素)之间的依赖性远不如某些高级潜在特征(例如概念或对象)之间的依赖性,这就是感兴趣的设置。我们提供了一个条件,在这个条件下,潜在的表示和潜在的潜在因果模型都可以通过简化为一个混合预言来识别。证明是建设性的,并导致几个算法显式重建完整的图形模型。我们讨论了有效的算法,并提供了实验来说明算法在实际中的应用。 摘要:We study the problem of reconstructing a causal graphical model from data in the presence of latent variables. The main problem of interest is recovering the causal structure over the latent variables while allowing for general, potentially nonlinear dependence between the variables. In many practical problems, the dependence between raw observations (e.g. pixels in an image) is much less relevant than the dependence between certain high-level, latent features (e.g. concepts or objects), and this is the setting of interest. We provide conditions under which both the latent representations and the underlying latent causal model are identifiable by a reduction to a mixture oracle. The proof is constructive, and leads to several algorithms for explicitly reconstructing the full graphical model. We discuss efficient algorithms and provide experiments illustrating the algorithms in practice.
【2】 Subgroup Generalization and Fairness of Graph Neural Networks 标题:图神经网络的子群泛化与公平性
作者:Jiaqi Ma,Junwei Deng,Qiaozhu Mei 机构:School of Information, University of Michigan, ‡Department of EECS 链接:https://arxiv.org/abs/2106.15535 摘要:尽管近年来图神经网络(GNNs)的应用取得了巨大的成功,但对其泛化能力的理论认识还很少,特别是对于数据非独立同分布(IID)的节点级任务。泛化性能的理论研究有助于理解GNN模型的基本问题(如公平性),设计更好的学习方法。在本文中,我们提出了一种新的PAC贝叶斯分析GNNs在非IID半监督学习设置。此外,我们还分析了在不同子群的未标记节点上的泛化性能,这使得我们可以从理论上进一步研究GNNs的精确(dis)奇偶校验(un)公平性。在合理的假设下,我们证明了测试子组和训练集之间的距离是影响GNN在该子组上性能的关键因素,这就需要特别注意训练节点的选择以实现公平学习。跨多个GNN模型和数据集的实验支持了我们的理论结果。 摘要:Despite enormous successful applications of graph neural networks (GNNs) recently, theoretical understandings of their generalization ability, especially for node-level tasks where data are not independent and identically-distributed (IID), have been sparse. The theoretical investigation of the generalization performance is beneficial for understanding fundamental issues (such as fairness) of GNN models and designing better learning methods. In this paper, we present a novel PAC-Bayesian analysis for GNNs under a non-IID semi-supervised learning setup. Moreover, we analyze the generalization performances on different subgroups of unlabeled nodes, which allows us to further study an accuracy-(dis)parity-style (un)fairness of GNNs from a theoretical perspective. Under reasonable assumptions, we demonstrate that the distance between a test subgroup and the training set can be a key factor affecting the GNN performance on that subgroup, which calls special attention to the training node selection for fair learning. Experiments across multiple GNN models and datasets support our theoretical results.
【3】 On Graph Neural Network Ensembles for Large-Scale Molecular Property Prediction 标题:图神经网络集成在大规模分子性质预测中的应用
作者:Edward Elson Kosasih,Joaquin Cabezas,Xavier Sumba,Piotr Bielak,Kamil Tagowski,Kelvin Idanwekhai,Benedict Aaron Tjandra,Arian Rokkum Jamasb 机构: University of Cambridge, United Kingdom, Wroc�law University of Science and Technology, Wroc�law, Poland, Universitat Rovira i Virgili, Tarragona, Spain, McGill University, Canada, Obafemi Awolowo University, Nigeria, ML Collective 备注:7 pages, 1 figure, 1 table 链接:https://arxiv.org/abs/2106.15529 摘要:为了推进大规模图机器学习,在2021年KDD杯上提出了开放图基准大规模挑战(OGB-LSC)。PCQM4M-LSC数据集定义了一个分子HOMO-LUMO性质预测任务。在这篇简短的论文中,我们展示了我们目前正在进行的工作解决方案,它建立了一个基于GIN、贝叶斯神经网络和DiffPool的三个图神经网络模型的集成。我们的方法比提供的基线好7.6%。此外,利用系综预测中的不确定性,我们可以确定HOMO-LUMO间隙更难预测的分子(Pearson相关系数为0.5181)。我们预计这将有助于积极学习。 摘要:In order to advance large-scale graph machine learning, the Open Graph Benchmark Large Scale Challenge (OGB-LSC) was proposed at the KDD Cup 2021. The PCQM4M-LSC dataset defines a molecular HOMO-LUMO property prediction task on about 3.8M graphs. In this short paper, we show our current work-in-progress solution which builds an ensemble of three graph neural networks models based on GIN, Bayesian Neural Networks and DiffPool. Our approach outperforms the provided baseline by 7.6%. Moreover, using uncertainty in our ensemble's prediction, we can identify molecules whose HOMO-LUMO gaps are harder to predict (with Pearson's correlation of 0.5181). We anticipate that this will facilitate active learning.
【4】 GraphAnoGAN: Detecting Anomalous Snapshots from Attributed Graphs 标题:GraphAnoGAN:从属性图中检测异常快照
作者:Siddharth Bhatia,Yiwei Wang,Bryan Hooi,Tanmoy Chakraborty 机构: National University of Singapore, IIIT-Delhi, India 备注:Accepted at ECML-PKDD 2021 链接:https://arxiv.org/abs/2106.15504 摘要:最近,从图形中发现异常快照引起了极大的关注。现有的研究利用诸如子空间选择、自我网络或社区分析等浅层学习机制来解决这个问题。这些模型没有考虑网络中结构和属性之间的多方面相互作用。本文提出了一种异常快照排序框架GraphAnoGAN,它由两个核心组件生成模型和判别模型组成。具体地说,生成模型从候选图快照集中学习近似异常样本的分布,判别模型检测采样的快照是否来自地面真值。在4个真实网络上的实验表明,GraphAnoGAN优于6个基线,具有显著的优势(与所有数据集平均的最佳基线相比,准确率和召回率分别提高28.29%和22.01%)。 摘要:Finding anomalous snapshots from a graph has garnered huge attention recently. Existing studies address the problem using shallow learning mechanisms such as subspace selection, ego-network, or community analysis. These models do not take into account the multifaceted interactions between the structure and attributes in the network. In this paper, we propose GraphAnoGAN, an anomalous snapshot ranking framework, which consists of two core components -- generative and discriminative models. Specifically, the generative model learns to approximate the distribution of anomalous samples from the candidate set of graph snapshots, and the discriminative model detects whether the sampled snapshot is from the ground-truth or not. Experiments on 4 real-world networks show that GraphAnoGAN outperforms 6 baselines with a significant margin (28.29% and 22.01% higher precision and recall, respectively compared to the best baseline, averaged across all datasets).
【5】 Multiple Graph Learning for Scalable Multi-view Clustering 标题:用于可伸缩多视图聚类的多图学习
作者:Tianyu Jiang,Quanxue Gao 机构: Xidian University 链接:https://arxiv.org/abs/2106.15382 摘要:基于图的多视图聚类由于能够有效地刻画多媒体数据之间的复杂结构和关系而成为当前研究的热点。然而,现有的方法存在以下不足:(1)由于图的构造和特征分解,使得大规模的图学习效率低下甚至失败(2) 它们不能很好地利用嵌入在不同视图图形中的互补信息和空间结构。为了更好地利用互补信息,解决基于图的多视图聚类的可扩展性问题,提出了一种基于少量锚定点和张量Schatten p-范数最小化的多图学习模型。具体来说,我们通过锚图为每个视图构造一个隐藏的、可处理的大图,并利用张量Schatten p-范数正则化器很好地挖掘了不同视图锚图中的互补信息。最后,我们提出了一个有效的算法,该算法随数据大小线性扩展,以解决我们提出的模型。在多个数据集上的大量实验结果表明,本文提出的方法优于一些最新的多视图聚类算法。 摘要:Graph-based multi-view clustering has become an active topic due to the efficiency in characterizing both the complex structure and relationship between multimedia data. However, existing methods have the following shortcomings: (1) They are inefficient or even fail for graph learning in large scale due to the graph construction and eigen-decomposition. (2) They cannot well exploit both the complementary information and spatial structure embedded in graphs of different views. To well exploit complementary information and tackle the scalability issue plaguing graph-based multi-view clustering, we propose an efficient multiple graph learning model via a small number of anchor points and tensor Schatten p-norm minimization. Specifically, we construct a hidden and tractable large graph by anchor graph for each view and well exploit complementary information embedded in anchor graphs of different views by tensor Schatten p-norm regularizer. Finally, we develop an efficient algorithm, which scales linearly with the data size, to solve our proposed model. Extensive experimental results on several datasets indicate that our proposed method outperforms some state-of-the-art multi-view clustering algorithms.
【6】 DeepGD: A Deep Learning Framework for Graph Drawing Using GNN 标题:DeepGD:一种基于GNN的深度学习图形绘制框架
作者:Xiaoqi Wang,Kevin Yen,Yifan Hu,Han-Wei Shen 机构:The Ohio State University, Yahoo Research 链接:https://arxiv.org/abs/2106.15347 摘要:在过去的几十年中,许多图形绘制技术被提出来生成美观的图形布局。然而,由于不同的布局方法倾向于突出图形的不同特征,因此它仍然是一个具有挑战性的任务。近年来,基于深度学习的图形绘制算法得到了广泛的研究,但这些算法在没有再训练的情况下往往不能推广到任意图形。本文提出了一种基于卷积图神经网络的深度学习框架DeepGD,该框架一经训练即可绘制任意图形。它试图通过在多个预先指定的美学之间妥协来生成布局,因为一个好的图形布局通常同时符合多个美学。为了平衡这种平衡,我们提出了两种自适应训练策略,在训练过程中动态调整每个美感的权重因子。对DeepGD的定量和定性评价表明,DeepGD能够有效地绘制任意图形,同时灵活地适应不同的审美标准。 摘要:In the past decades, many graph drawing techniques have been proposed for generating aesthetically pleasing graph layouts. However, it remains a challenging task since different layout methods tend to highlight different characteristics of the graphs. Recently, studies on deep learning based graph drawing algorithm have emerged but they are often not generalizable to arbitrary graphs without re-training. In this paper, we propose a Convolutional Graph Neural Network based deep learning framework, DeepGD, which can draw arbitrary graphs once trained. It attempts to generate layouts by compromising among multiple pre-specified aesthetics considering a good graph layout usually complies with multiple aesthetics simultaneously. In order to balance the trade-off, we propose two adaptive training strategies which adjust the weight factor of each aesthetic dynamically during training. The quantitative and qualitative assessment of DeepGD demonstrates that it is capable of drawing arbitrary graphs effectively, while being flexible at accommodating different aesthetic criteria.
【7】 Generating the Graph Gestalt: Kernel-Regularized Graph Representation Learning 标题:图Gestalt的生成:核正则化的图表示学习
作者:Kiarash Zahirnia,Ankita Sakhuja,Oliver Schulte,Parmis Nadaf,Ke Li,Xia Hu 链接:https://arxiv.org/abs/2106.15239 摘要:最近关于图生成模型的工作在生成越来越真实的图方面取得了显著的进展,这些图是通过度分布、密度和聚类系数等全局图特征来度量的。深度生成模型通过更好地建模图拓扑中的局部相关性也取得了重大进展,这对于从附近观察到的图组件预测未观察到的图组件(如链接的存在或节点的类别)非常有用。对图形数据的全面科学理解应同时考虑全局和局部结构。在本文中,我们提出了一个联合模型,两者作为互补的目标,在图VAE框架。通过在概率模型中加入图核来捕获全局结构,其损失函数与重构图和输入图的全局结构之间的最大平均差异(MMD)密切相关。由该模型导出的ELBO目标用MMD项正则化了一个标准的局部链路重建项。我们的实验表明,与领先的图vaean和GAN模型相比,生成的图结构的真实性有了显著的提高,通常是1-2个数量级的图结构度量。在许多情况下,局部链路重建也得到了改进。 摘要:Recent work on graph generative models has made remarkable progress towards generating increasingly realistic graphs, as measured by global graph features such as degree distribution, density, and clustering coefficients. Deep generative models have also made significant advances through better modelling of the local correlations in the graph topology, which have been very useful for predicting unobserved graph components, such as the existence of a link or the class of a node, from nearby observed graph components. A complete scientific understanding of graph data should address both global and local structure. In this paper, we propose a joint model for both as complementary objectives in a graph VAE framework. Global structure is captured by incorporating graph kernels in a probabilistic model whose loss function is closely related to the maximum mean discrepancy(MMD) between the global structures of the reconstructed and the input graphs. The ELBO objective derived from the model regularizes a standard local link reconstruction term with an MMD term. Our experiments demonstrate a significant improvement in the realism of the generated graph structures, typically by 1-2 orders of magnitude of graph structure metrics, compared to leading graph VAEand GAN models. Local link reconstruction improves as well in many cases.
【8】 Leveraging Static Models for Link Prediction in Temporal Knowledge Graphs 标题:利用静电模型进行时态知识图中的链接预测
作者:Wessel Radstok,Mel Chekol 机构:nlUtrecht UniversityAbstract 链接:https://arxiv.org/abs/2106.15223 摘要:在知识图嵌入(KGE)中包含事实的时间范围为改进结果嵌入提供了重要的机会,从而提高了下游应用程序的性能。然而,很少有研究致力于这一领域,与没有时间范围的训练模型(静态模型)相比,许多已开展的研究报告只略微改善了结果。此外,他们没有利用静态模型的现有工作,而是引入了特定于时态知识图的新模型。我们提出了一种新的观点,通过集中精力处理数据来利用现有静态嵌入模型的能力。我们的方法SpliMe从信号处理领域和早期的图形嵌入工作中得到了启发。我们证明了SpliMe与时态KGE的最新技术相竞争或优于后者。此外,我们揭示了当前用于评估时态图上静态模型性能的过程中存在的问题,并介绍了两种方法来抵消这些问题。 摘要:The inclusion of temporal scopes of facts in knowledge graph embedding (KGE) presents significant opportunities for improving the resulting embeddings, and consequently for increased performance in downstream applications. Yet, little research effort has focussed on this area and much of the carried out research reports only marginally improved results compared to models trained without temporal scopes (static models). Furthermore, rather than leveraging existing work on static models, they introduce new models specific to temporal knowledge graphs. We propose a novel perspective that takes advantage of the power of existing static embedding models by focussing effort on manipulating the data instead. Our method, SpliMe, draws inspiration from the field of signal processing and early work in graph embedding. We show that SpliMe competes with or outperforms the current state of the art in temporal KGE. Additionally, we uncover issues with the procedure currently used to assess the performance of static models on temporal graphs and introduce two ways to counteract them.
【9】 Evolving-Graph Gaussian Processes 标题:演化图高斯过程
作者:David Blanco-Mulero,Markus Heinonen,Ville Kyrki 机构: GPs have been applied 1School of Electrical Engineering, Aalto University, Finland 2Department of Computer Science 备注:Accepted for publication at ICML 2021 Time Series Workshop (TSW) 链接:https://arxiv.org/abs/2106.15127 摘要:图高斯过程(GGPs)为图结构域提供了一种数据高效的解决方案。现有的方法主要集中在静态结构上,而许多真实的图形数据是动态结构,限制了GGPs的应用。为了克服这个问题,我们提出了演化图高斯过程(e-GGPs)。该方法可以通过邻域核来学习图顶点随时间的转移函数,从而模拟顶点之间的连通性和交互变化。我们评估的性能,我们的方法对时间序列回归问题的图形随着时间的推移发展。我们证明了e-GGPs相对于静态图高斯过程方法的优势。 摘要:Graph Gaussian Processes (GGPs) provide a data-efficient solution on graph structured domains. Existing approaches have focused on static structures, whereas many real graph data represent a dynamic structure, limiting the applications of GGPs. To overcome this we propose evolving-Graph Gaussian Processes (e-GGPs). The proposed method is capable of learning the transition function of graph vertices over time with a neighbourhood kernel to model the connectivity and interaction changes between vertices. We assess the performance of our method on time-series regression problems where graphs evolve over time. We demonstrate the benefits of e-GGPs over static graph Gaussian Process approaches.
【10】 GraphPiece: Efficiently Generating High-Quality Molecular Graph with Substructures 标题:GraphPiess:高效生成具有子结构的高质量分子图
作者:Xiangzhe Kong,Zhixing Tan,Yang Liu 机构:Dept. of Comp. Sci. & Tech., Institute for AI, BNRist Center, Tsinghua University, Beijing Academy of Artificial Intelligence ,Institute for AIR, Tsinghua University 备注:15 pages, 9 figures, under review 链接:https://arxiv.org/abs/2106.15098 摘要:分子图的生成是药物发现和材料科学等领域的一项基础性而富有挑战性的任务,它要求生成具有理想性质的有效分子。近年来,自回归模型得到了迅速的发展,它通常在原子水平上通过添加节点和边的顺序动作来构造图。然而,这些原子级模型忽略了高频子图,这些高频子图不仅捕捉了分子中原子结合的规律,而且往往与所需的化学性质有关。在本文中,我们提出了一种从给定的分子图中自动发现这种公共子结构的方法,我们称之为{\em图块}。基于图块,我们利用一个可变的自动编码器来生成分子分为两个阶段:图块级生成和键合完成。实验结果表明,本文提出的图块变分自动编码器在性能优化和约束性能优化任务上均优于现有的基线算法,具有较高的计算效率。 摘要:Molecular graph generation is a fundamental but challenging task in various applications such as drug discovery and material science, which requires generating valid molecules with desired properties. Auto-regressive models, which usually construct graphs following sequential actions of adding nodes and edges at the atom-level, have made rapid progress in recent years. However, these atom-level models ignore high-frequency subgraphs that not only capture the regularities of atomic combination in molecules but also are often related to desired chemical properties. In this paper, we propose a method to automatically discover such common substructures, which we call {\em graph pieces}, from given molecular graphs. Based on graph pieces, we leverage a variational autoencoder to generate molecules in two phases: piece-level graph generation followed by bond completion. Experiments show that our graph piece variational autoencoder achieves better performance over state-of-the-art baselines on property optimization and constrained property optimization tasks with higher computational efficiency.
Transformer(1篇)
【1】 Geometry-aware Transformer for molecular property prediction 标题:用于分子性质预测的几何感知Transformer
作者:Bumju Kwak,Jeonghee Jo,Byunghan Lee,Sungroh Yoon 机构:Recommendation Team, Kakao Corporation, Gyeonggi, Republic of Korea, Interdisciplinary Program in Bioinformatics, Seoul National University, Seoul, Republic of Korea 备注:11 pages, 3 figures 链接:https://arxiv.org/abs/2106.15516 摘要:近年来,图神经网络(GNNs)在量子力学问题上取得了令人瞩目的成绩。然而,图卷积只能覆盖局部区域,不能捕捉原子间的长程相互作用。这种行为与理论上的原子间势相反,原子间势是基于空间的GNNs的一个基本限制。在这项工作中,我们提出了一个新的基于注意的分子性质预测任务框架。我们将分子构象表示为由原子-原子距离属性组合的离散原子序列,称为几何感知变换器(GeoT)。特别地,我们采用了一种广泛应用于序列数据的转换器结构。我们提出的模型基于全局构造的注意训练分子图的序列表示,保持原子对的所有空间排列。我们的方法不受成本密集型计算的影响,例如角度计算。在多个公共基准和可视化地图上的实验结果表明,保持长程原子间属性可以显著提高模型的可预测性。 摘要:Recently, graph neural networks (GNNs) have achieved remarkable performances for quantum mechanical problems. However, a graph convolution can only cover a localized region, and cannot capture long-range interactions of atoms. This behavior is contrary to theoretical interatomic potentials, which is a fundamental limitation of the spatial based GNNs. In this work, we propose a novel attention-based framework for molecular property prediction tasks. We represent a molecular conformation as a discrete atomic sequence combined by atom-atom distance attributes, named Geometry-aware Transformer (GeoT). In particular, we adopt a Transformer architecture, which has been widely used for sequential data. Our proposed model trains sequential representations of molecular graphs based on globally constructed attentions, maintaining all spatial arrangements of atom pairs. Our method does not suffer from cost intensive computations, such as angle calculations. The experimental results on several public benchmarks and visualization maps verified that keeping the long-range interatomic attributes can significantly improve the model predictability.
GAN|对抗|攻击|生成相关(15篇)
【1】 Generalization of Reinforcement Learning with Policy-Aware Adversarial Data Augmentation 标题:策略感知对抗性数据增强强化学习的推广
作者:Hanping Zhang,Yuhong Guo 机构:Carleton University, Ottawa, Canada 链接:https://arxiv.org/abs/2106.15587 摘要:强化学习(RL)中的泛化鸿沟一直是阻碍RL智能体学习一般技能和适应不同环境的重要障碍。提高RL系统的泛化能力可以显著提高其在实际工作环境中的性能。在这项工作中,我们提出了一种新的策略感知对抗性数据扩充方法,用自动生成的轨迹数据扩充标准策略学习方法。与常用的基于观测变换的数据扩充方法不同,本文提出的方法基于策略梯度目标,以对抗性的方式生成新的轨迹数据,通过策略感知的数据扩充,提高RL agent的泛化能力。此外,我们进一步部署了一个合成步骤来整合原始数据和生成的数据,以增强泛化能力,同时减少对抗性数据的过度偏差。我们在多个RL任务上进行了实验,通过与标准基线和最新的mixreg方法进行比较,研究了该方法的泛化性能。结果表明,该方法在训练多样性有限的情况下,具有良好的泛化能力,达到了最先进的泛化测试性能。 摘要:The generalization gap in reinforcement learning (RL) has been a significant obstacle that prevents the RL agent from learning general skills and adapting to varying environments. Increasing the generalization capacity of the RL systems can significantly improve their performance on real-world working environments. In this work, we propose a novel policy-aware adversarial data augmentation method to augment the standard policy learning method with automatically generated trajectory data. Different from the commonly used observation transformation based data augmentations, our proposed method adversarially generates new trajectory data based on the policy gradient objective and aims to more effectively increase the RL agent's generalization ability with the policy-aware data augmentation. Moreover, we further deploy a mixup step to integrate the original and generated data to enhance the generalization capacity while mitigating the over-deviation of the adversarial data. We conduct experiments on a number of RL tasks to investigate the generalization performance of the proposed method by comparing it with the standard baselines and the state-of-the-art mixreg approach. The results show our method can generalize well with limited training diversity, and achieve the state-of-the-art generalization test performance.
【2】 Uncertainty-Guided Progressive GANs for Medical Image Translation 标题:不确定性引导的渐进GANS在医学图像翻译中的应用
作者:Uddeshya Upadhyay,Yanbei Chen,Tobias Hepp,Sergios Gatidis,Zeynep Akata 机构: University of T¨ubingen, Max Planck Institute for Intelligent Systems 备注:accepted at MICCAI 2021, code is released here: this https URL 链接:https://arxiv.org/abs/2106.15542 摘要:图像到图像的转换在处理衰减校正、运动校正、欠采样重建和去噪等各种医学成像任务中起着至关重要的作用。生成性对抗网络已被证明在为这些任务生成高保真图像方面达到了最先进的水平。然而,最先进的基于GAN的框架并不能估计网络预测中的不确定性,这对于做出明智的医疗决策和医学专家随后的修订至关重要,而且最近已经证明可以提高模型的性能和可解释性。在这项工作中,我们提出了一个不确定性引导的图像到图像的渐进学习方案。通过将任意不确定性作为以渐进方式训练的动作的注意图,我们逐步生成保真度不断提高的图像。我们证明了我们的模型在三个具有挑战性的医学图像翻译任务上的有效性,包括PET到CT的翻译、欠采样MRI重建和MRI运动伪影校正。我们的模型在三个不同的任务中都有很好的推广,并且在数据有限的情况下,在完全监督和弱监督的情况下提高了性能。代码在此处发布:https://github.com/ExplainableML/UncerGuidedI2I 摘要:Image-to-image translation plays a vital role in tackling various medical imaging tasks such as attenuation correction, motion correction, undersampled reconstruction, and denoising. Generative adversarial networks have been shown to achieve the state-of-the-art in generating high fidelity images for these tasks. However, the state-of-the-art GAN-based frameworks do not estimate the uncertainty in the predictions made by the network that is essential for making informed medical decisions and subsequent revision by medical experts and has recently been shown to improve the performance and interpretability of the model. In this work, we propose an uncertainty-guided progressive learning scheme for image-to-image translation. By incorporating aleatoric uncertainty as attention maps for GANs trained in a progressive manner, we generate images of increasing fidelity progressively. We demonstrate the efficacy of our model on three challenging medical image translation tasks, including PET to CT translation, undersampled MRI reconstruction, and MRI motion artefact correction. Our model generalizes well in three different tasks and improves performance over state of the art under full-supervision and weak-supervision with limited data. Code is released here: https://github.com/ExplainableML/UncerGuidedI2I
【3】 Efficient Realistic Data Generation Framework leveraging Deep Learning-based Human Digitization 标题:利用基于深度学习的人类数字化的高效真实感数据生成框架
作者:C. Symeonidis,P. Nousi,P. Tosidis,K. Tsampazis,N. Passalis,A. Tefas,N. Nikolaidis 机构:Artificial Intelligence and Information Analysis Lab, Department of Informatics, Aristotle University of Thessaloniki, Thessaloniki, Greece 链接:https://arxiv.org/abs/2106.15409 摘要:有监督深度学习算法的性能在很大程度上取决于用于训练的数据的规模、质量和多样性。收集和手动注释大量数据既费时又费钱。在与视觉人类中心感知相关的任务中,由于有关隐私的立法,此类数据的收集和分发也可能面临限制。此外,复杂系统的设计和测试,例如机器人,通常采用基于深度学习的感知模型,可能会面临严重的困难,因为即使是在真实和大规模数据集上训练的最新方法也不能始终充分发挥作用,因为它们不能适应虚拟世界和真实世界数据之间的视觉差异。为了解决和减轻这些问题的影响,我们提出了一种自动生成具有注释的真实合成数据的方法,用于a)人员检测、b)人脸识别和c)人体姿势估计。该方法以真实背景图像为输入,用不同姿态的人像填充。我们不使用手工制作的三维人体模型,而是建议使用通过深度学习方法生成的模型,进一步降低数据集的创建成本,同时保持较高的真实感。此外,我们还提供了开源且易于使用的工具来实现拟议的管道,允许为各种任务生成高度真实的合成数据集。在相应的任务中进行的基准测试和评估表明,合成数据可以有效地作为真实数据的补充。 摘要:The performance of supervised deep learning algorithms depends significantly on the scale, quality and diversity of the data used for their training. Collecting and manually annotating large amount of data can be both time-consuming and costly tasks to perform. In the case of tasks related to visual human-centric perception, the collection and distribution of such data may also face restrictions due to legislation regarding privacy. In addition, the design and testing of complex systems, e.g., robots, which often employ deep learning-based perception models, may face severe difficulties as even state-of-the-art methods trained on real and large-scale datasets cannot always perform adequately as they have not adapted to the visual differences between the virtual and the real world data. As an attempt to tackle and mitigate the effect of these issues, we present a method that automatically generates realistic synthetic data with annotations for a) person detection, b) face recognition, and c) human pose estimation. The proposed method takes as input real background images and populates them with human figures in various poses. Instead of using hand-made 3D human models, we propose the use of models generated through deep learning methods, further reducing the dataset creation costs, while maintaining a high level of realism. In addition, we provide open-source and easy to use tools that implement the proposed pipeline, allowing for generating highly-realistic synthetic datasets for a variety of tasks. A benchmarking and evaluation in the corresponding tasks shows that synthetic data can be effectively used as a supplement to real data.
【4】 Attack Transferability Characterization for Adversarially Robust Multi-label Classification 标题:抗攻击鲁棒多标签分类的攻击可传递性表征
作者:Zhuo Yang,Yufei Han,Xiangliang Zhang 机构: King Abdullah University of Science and Technology, Thuwal, Saudi Arabia, CIDRE team, Inria, France 备注:Accepted in ECML-PKDD 2021 链接:https://arxiv.org/abs/2106.15360 摘要:尽管多标签规避攻击的存在非常普遍,但对多标签学习系统的攻击脆弱性来源进行描述并评估其可攻击性仍然是一个开放而重要的问题。在本研究中,我们主要研究针对多标签分类器的非目标规避攻击。该威胁的目标是在相同的输入扰动下,对尽可能多的标签造成未分类。我们的工作首先基于多标签分类器的功能特性来刻画攻击的可转移性,从而对多标签对抗攻击有了深入的了解。通过建立对抗风险的信息论分析,揭示了攻击的可转移性水平如何决定分类器的可攻击性。此外,我们提出了一种以可转移性为中心的攻击性评估方法,称为软攻击性估计器(SAE),用来评估目标多标签分类器的内在脆弱性水平。然后将该估计器作为可转移性调整正则化项集成到多标签学习范式中,以实现对抗性鲁棒分类。对真实数据的实验研究与理论分析相吻合,验证了可转移正则化多标记学习方法的有效性。 摘要:Despite of the pervasive existence of multi-label evasion attack, it is an open yet essential problem to characterize the origin of the adversarial vulnerability of a multi-label learning system and assess its attackability. In this study, we focus on non-targeted evasion attack against multi-label classifiers. The goal of the threat is to cause miss-classification with respect to as many labels as possible, with the same input perturbation. Our work gains in-depth understanding about the multi-label adversarial attack by first characterizing the transferability of the attack based on the functional properties of the multi-label classifier. We unveil how the transferability level of the attack determines the attackability of the classifier via establishing an information-theoretic analysis of the adversarial risk. Furthermore, we propose a transferability-centered attackability assessment, named Soft Attackability Estimator (SAE), to evaluate the intrinsic vulnerability level of the targeted multi-label classifier. This estimator is then integrated as a transferability-tuning regularization term into the multi-label learning paradigm to achieve adversarially robust classification. The experimental study on real-world data echos the theoretical analysis and verify the validity of the transferability-regularized multi-label learning method.
【5】 Multi-stage Optimization based Adversarial Training 标题:基于多阶段优化的对抗性训练
作者:Xiaosen Wang,Chuanbiao Song,Liwei Wang,Kun He 机构: School of Computer Science and Technology, Huazhong University of Science and Technology, School of Electronics Engineering and Computer Sciences, Peking University 备注:13 pages 链接:https://arxiv.org/abs/2106.15357 摘要:在对抗鲁棒性领域,通常采用单步对抗训练来快速建立对抗鲁棒性模型。然而,单步对抗性训练极有可能导致灾难性的过度拟合,因为经过几个训练周期后,很难产生强大的对抗性例子来不断提高对抗的鲁棒性。在这项工作中,我们通过在单步对抗训练中引入多步对抗例子来避免灾难性的过度拟合。然后,为了平衡生成多步对抗性示例所需的大量训练开销,我们提出了一种基于多阶段优化的对抗性训练(MOAT)方法,该方法对混合良性示例、单步对抗性示例和多步对抗性示例进行阶段性训练。这样,模型的总体训练开销大大减少,同时避免了灾难性的过度拟合。在CIFAR-10和CIFAR-100数据集上的大量实验表明,在相同的训练开销下,该方法比单步或多步对抗训练方法具有更好的鲁棒性。 摘要:In the field of adversarial robustness, there is a common practice that adopts the single-step adversarial training for quickly developing adversarially robust models. However, the single-step adversarial training is most likely to cause catastrophic overfitting, as after a few training epochs it will be hard to generate strong adversarial examples to continuously boost the adversarial robustness. In this work, we aim to avoid the catastrophic overfitting by introducing multi-step adversarial examples during the single-step adversarial training. Then, to balance the large training overhead of generating multi-step adversarial examples, we propose a Multi-stage Optimization based Adversarial Training (MOAT) method that periodically trains the model on mixed benign examples, single-step adversarial examples, and multi-step adversarial examples stage by stage. In this way, the overall training overhead is reduced significantly, meanwhile, the model could avoid catastrophic overfitting. Extensive experiments on CIFAR-10 and CIFAR-100 datasets demonstrate that under similar amount of training overhead, the proposed MOAT exhibits better robustness than either single-step or multi-step adversarial training methods.
【6】 Where is the disease? Semi-supervised pseudo-normality synthesis from an abnormal image 标题:疾病在哪里?基于异常图像的半监督伪正态合成
作者:Yuanqi Du,Quan Quan,Hu Han,S. Kevin Zhou 机构: George Mason University, Institute of Computing Technology, Chi-, nese Academy of Sciences 链接:https://arxiv.org/abs/2106.15345 摘要:伪正态性合成(Pseudo normality synthesis,Pseudo normality synthesis)是从病变检测、数据增强到临床手术建议等多个角度对异常图像(如病变)进行计算生成伪正态图像的关键技术。然而,在缺乏病变信息的情况下,生成高质量的伪正常图像是一个挑战。因此,引入昂贵的病灶分割数据为生成模型提供病灶信息,提高合成图像的质量。本文旨在解决伪正常图像生成过程中对大量病灶分割数据的需求。提出了一种半监督医学图像生成学习网络(SMILE),该网络不仅利用有限的带分割掩模的医学图像,而且利用不带分割掩模的海量医学图像生成逼真的伪正常图像。大量的实验表明,我们的模型在数据增强任务上比目前最好的模型高出6%,在生成高质量的图像上比现有的模型高出3%。此外,本文提出的半监督学习算法仅需50%的分割数据,就可以获得与监督学习模型相当的医学图像合成质量。 摘要:Pseudo-normality synthesis, which computationally generates a pseudo-normal image from an abnormal one (e.g., with lesions), is critical in many perspectives, from lesion detection, data augmentation to clinical surgery suggestion. However, it is challenging to generate high-quality pseudo-normal images in the absence of the lesion information. Thus, expensive lesion segmentation data have been introduced to provide lesion information for the generative models and improve the quality of the synthetic images. In this paper, we aim to alleviate the need of a large amount of lesion segmentation data when generating pseudo-normal images. We propose a Semi-supervised Medical Image generative LEarning network (SMILE) which not only utilizes limited medical images with segmentation masks, but also leverages massive medical images without segmentation masks to generate realistic pseudo-normal images. Extensive experiments show that our model outperforms the best state-of-the-art model by up to 6% for data augmentation task and 3% in generating high-quality images. Moreover, the proposed semi-supervised learning achieves comparable medical image synthesis quality with supervised learning model, using only 50 of segmentation data.
【7】 Image Inpainting Using Wasserstein Generative Adversarial Imputation Network 标题:基于Wasserstein生成性对抗性补偿网络的图像修复
作者:Daniel Vašata,Tomáš Halama,Magda Friedjungová 机构:Czech Technical University in Prague, Prague, Czech Republic 备注:To be published in conference proceedings of ICANN 2021 链接:https://arxiv.org/abs/2106.15341 摘要:图像修复是计算机视觉中的一项重要任务,它主要解决图像中缺失区域的重建问题。本文的目的是介绍一种基于Wasserstein生成对抗性插补网络的图像修复模型。模型的生成网络使用具有不同膨胀率的卷积层的构建块,以及帮助模型再现输出的精细细节的跳过连接。这种结合产生了一个通用的插补模型,能够以足够的质量处理各种情况下的缺失。为了在实验上证明这一点,我们同时训练模型来处理三种情况:随机丢失像素,丢失各种较小的正方形区域,以及在图像中心丢失一个正方形。结果表明,我们的模型在所有场景下都能获得高质量的修复结果。使用峰值信噪比和结构相似性指数对两个真实基准数据集CelebA faces和Paris StreetView的性能进行了评估。我们的模型的结果与双调和插补和其他一些最先进的图像修复方法进行了比较。 摘要:Image inpainting is one of the important tasks in computer vision which focuses on the reconstruction of missing regions in an image. The aim of this paper is to introduce an image inpainting model based on Wasserstein Generative Adversarial Imputation Network. The generator network of the model uses building blocks of convolutional layers with different dilation rates, together with skip connections that help the model reproduce fine details of the output. This combination yields a universal imputation model that is able to handle various scenarios of missingness with sufficient quality. To show this experimentally, the model is simultaneously trained to deal with three scenarios given by missing pixels at random, missing various smaller square regions, and one missing square placed in the center of the image. It turns out that our model achieves high-quality inpainting results on all scenarios. Performance is evaluated using peak signal-to-noise ratio and structural similarity index on two real-world benchmark datasets, CelebA faces and Paris StreetView. The results of our model are compared to biharmonic imputation and to some of the other state-of-the-art image inpainting methods.
【8】 SE-MD: A Single-encoder multiple-decoder deep network for point cloud generation from 2D images 标题:SE-MD:一种用于从二维图像生成点云的单编码器多解码器深度网络
作者:Abdul Mueed Hafiz,Rouf Ul Alam Bhat,Shabir Ahmad Parah,M. Hassaballah 机构:the date of receipt and acceptance should be inserted later 链接:https://arxiv.org/abs/2106.15325 摘要:从单个二维RGB图像生成三维模型是一项具有挑战性的计算机视觉研究课题。针对同一问题,已经提出了使用传统网络体系结构的各种技术。然而,目前的研究工作还很有限,存在着各种各样的问题,如使用低效的三维表示格式、弱的三维模型生成主干、无法生成稠密点云、稠密点云生成的后处理依赖性以及RGB图像中轮廓的依赖性。本文提出了一种新的二维RGB图像到点云的转换技术,该技术利用网络结构中的并行化概念,以其高效、健壮和简单的模型改进了该领域的研究现状。它不仅利用了点云的高效和丰富的三维表示,而且利用了一种新颖而健壮的点云生成主干来解决当前普遍存在的问题。这涉及使用单个编码器-多解码器深度网络架构,其中每个解码器生成特定的固定视点。然后融合所有视点生成密集点云。对该技术进行了各种实验,并将其性能与其它先进技术进行了比较,取得了显著的效果。代码位于https://github.com/mueedhafiz1982/ 摘要:3D model generation from single 2D RGB images is a challenging and actively researched computer vision task. Various techniques using conventional network architectures have been proposed for the same. However, the body of research work is limited and there are various issues like using inefficient 3D representation formats, weak 3D model generation backbones, inability to generate dense point clouds, dependence of post-processing for generation of dense point clouds, and dependence on silhouettes in RGB images. In this paper, a novel 2D RGB image to point cloud conversion technique is proposed, which improves the state of art in the field due to its efficient, robust and simple model by using the concept of parallelization in network architecture. It not only uses the efficient and rich 3D representation of point clouds, but also uses a novel and robust point cloud generation backbone in order to address the prevalent issues. This involves using a single-encoder multiple-decoder deep network architecture wherein each decoder generates certain fixed viewpoints. This is followed by fusing all the viewpoints to generate a dense point cloud. Various experiments are conducted on the technique and its performance is compared with those of other state of the art techniques and impressive gains in performance are demonstrated. Code is available at https://github.com/mueedhafiz1982/
【9】 Cascaded Diffusion Models for High Fidelity Image Generation 标题:用于高保真图像生成的级联扩散模型
作者:Jonathan Ho,Chitwan Saharia,William Chan,David J. Fleet,Mohammad Norouzi,Tim Salimans 机构:Google Research 链接:https://arxiv.org/abs/2106.15282 摘要:我们证明了级联扩散模型能够在类条件ImageNet生成挑战上生成高保真图像,而不需要任何辅助图像分类器的帮助来提高样本质量。级联扩散模型包括多个扩散模型的管道,这些扩散模型生成分辨率不断提高的图像,首先是最低分辨率的标准扩散模型,然后是一个或多个超分辨率扩散模型,这些模型依次对图像进行上采样并添加更高分辨率的细节。我们发现级联管道的样本质量主要依赖于条件增强,我们提出的方法是将低分辨率条件输入数据增强到超分辨率模型中。我们的实验表明,条件增强可以防止级联模型采样过程中的复合误差,帮助我们训练级联管道,在64x64、128x128和256x256分辨率下的FID分数分别达到1.48、3.52和4.88,优于BigGAN-deep。 摘要:We show that cascaded diffusion models are capable of generating high fidelity images on the class-conditional ImageNet generation challenge, without any assistance from auxiliary image classifiers to boost sample quality. A cascaded diffusion model comprises a pipeline of multiple diffusion models that generate images of increasing resolution, beginning with a standard diffusion model at the lowest resolution, followed by one or more super-resolution diffusion models that successively upsample the image and add higher resolution details. We find that the sample quality of a cascading pipeline relies crucially on conditioning augmentation, our proposed method of data augmentation of the lower resolution conditioning inputs to the super-resolution models. Our experiments show that conditioning augmentation prevents compounding error during sampling in a cascaded model, helping us to train cascading pipelines achieving FID scores of 1.48 at 64x64, 3.52 at 128x128 and 4.88 at 256x256 resolutions, outperforming BigGAN-deep.
【10】 Improving Transferability of Adversarial Patches on Face Recognition with Generative Models 标题:利用产生式模型提高人脸识别中对抗性补丁的可转移性
作者:Zihao Xiao,Xianfeng Gao,Chilin Fu,Yinpeng Dong,Wei Gao,Xiaolu Zhang,Jun Zhou,Jun Zhu 机构: RealAI, Ant Financial, Tsinghua University, Beijing Institute of Technology, Nanyang Technological University 备注:Accpeted by CVPR 2021. Based on the camera ready version, some typos are fixed 链接:https://arxiv.org/abs/2106.15058 摘要:深度卷积神经网络(CNNs)极大地提高了人脸识别率。近年来,这些人脸识别模型被用于安全敏感应用中的身份认证。然而,deep-CNNs容易受到物理上可实现且隐蔽的对抗性补丁的攻击,这给这些模型的实际应用带来了新的安全问题。在本文中,我们评估了基于可转移性的对抗性面片人脸识别模型的鲁棒性,其中攻击者对目标模型的可访问性有限。首先,我们扩展现有的基于传输的攻击技术来生成可传输的对抗补丁。然而,我们观察到可转移性对初始值很敏感,当扰动幅度较大时,可转移性会下降,这表明对替代模型的过度拟合。其次,我们提出在低维数据流形上正则化对抗补丁。流形由在合法人脸图像上预先训练的生成模型表示。通过对流形的优化,将人脸特征作为对抗性扰动,我们发现替代模型的响应与目标模型的响应之间的差距显著减小,表现出更好的可转移性。大量的数字世界实验证明了该方法在黑盒环境下的优越性。我们也将所提出的方法应用于物理世界。 摘要:Face recognition is greatly improved by deep convolutional neural networks (CNNs). Recently, these face recognition models have been used for identity authentication in security sensitive applications. However, deep CNNs are vulnerable to adversarial patches, which are physically realizable and stealthy, raising new security concerns on the real-world applications of these models. In this paper, we evaluate the robustness of face recognition models using adversarial patches based on transferability, where the attacker has limited accessibility to the target models. First, we extend the existing transfer-based attack techniques to generate transferable adversarial patches. However, we observe that the transferability is sensitive to initialization and degrades when the perturbation magnitude is large, indicating the overfitting to the substitute models. Second, we propose to regularize the adversarial patches on the low dimensional data manifold. The manifold is represented by generative models pre-trained on legitimate human face images. Using face-like features as adversarial perturbations through optimization on the manifold, we show that the gaps between the responses of substitute models and the target models dramatically decrease, exhibiting a better transferability. Extensive digital world experiments are conducted to demonstrate the superiority of the proposed method in the black-box setting. We apply the proposed method in the physical world as well.
【11】 Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent 标题:正交投影梯度下降法规避敌方样本检测防御
作者:Oliver Bryniarski,Nabeel Hingun,Pedro Pachuca,Vincent Wang,Nicholas Carlini 机构:UC Berkeley, Google 链接:https://arxiv.org/abs/2106.15023 摘要:规避对抗性实例检测防御要求找到必须同时(a)被模型错误分类和(b)被检测为非对抗性的对抗性实例。我们发现,试图同时满足多个约束的现有攻击常常以满足另一个约束为代价对一个约束进行过度优化。我们引入了正交投影梯度下降,这是一种改进的攻击技术来生成对抗性的例子,在执行标准的基于梯度的攻击时通过正交化梯度来避免这个问题。我们使用我们的技术来规避四个最先进的检测防御,将它们的准确率降低到0%,同时保持0%的检测率。 摘要:Evading adversarial example detection defenses requires finding adversarial examples that must simultaneously (a) be misclassified by the model and (b) be detected as non-adversarial. We find that existing attacks that attempt to satisfy multiple simultaneous constraints often over-optimize against one constraint at the cost of satisfying another. We introduce Orthogonal Projected Gradient Descent, an improved attack technique to generate adversarial examples that avoids this problem by orthogonalizing the gradients when running standard gradient-based attacks. We use our technique to evade four state-of-the-art detection defenses, reducing their accuracy to 0% while maintaining a 0% detection rate.
【12】 Constructing Forest Biomass Prediction Maps from Radar Backscatter by Sequential Regression with a Conditional Generative Adversarial Network 标题:基于条件生成对抗网络的序贯回归雷达后向散射构建森林生物量预测图
作者:Sara Björk,Stian Normann Anfinsen,Erik Næsset,Terje Gobakken,Eliakimu Zahabu 机构: Norwegian University of Life Sciences 链接:https://arxiv.org/abs/2106.15020 摘要:研究了利用合成孔径雷达(SAR)强度图像构建地上生物量(AGB)预测图的方法。其目的是改进传统的基于SAR强度的回归模型,用有限的AGB原位测量数据进行训练。虽然采集成本很高,但机载激光扫描(ALS)传感器的数据与AGB高度相关。因此,我们建议使用基于ALS数据的AGB预测作为SAR数据的替代响应变量,并采用顺序建模的方式。这大大增加了训练数据量。为了模拟SAR强度与ALS预测AGB之间的回归函数,我们建议使用条件生成对抗网络(cGAN),即Pix2Pix卷积神经网络。这使得现有的基于ALS的AGB预测图的重建成为可能。所生成的综合基于ALS的AGB预测与从同一区域训练的传统非序列回归模型中检索到的基于ALS的AGB预测进行了定性和定量评估。结果表明,所提出的体系结构能够捕获实际数据的特征。这表明使用ALS引导的生成模型是从SAR强度预测AGB的一个很有前途的途径。对这一领域的进一步研究有可能提供大规模和低成本的AGB预测。 摘要:This paper studies construction of above-ground biomass (AGB) prediction maps from synthetic aperture radar (SAR) intensity images. The purpose is to improve traditional regression models based on SAR intensity, trained with a limited amount of AGB in situ measurements. Although it is costly to collect, data from airborne laser scanning (ALS) sensors are highly correlated with AGB. Therefore, we propose using AGB predictions based on ALS data as surrogate response variables for SAR data in a sequential modelling fashion. This increases the amount of training data dramatically. To model the regression function between SAR intensity and ALS-predicted AGB we propose to utilise a conditional generative adversarial network (cGAN), i.e. the Pix2Pix convolutional neural network. This enables the recreation of existing ALS-based AGB prediction maps. The generated synthesised ALS-based AGB predictions are evaluated qualitatively and quantitatively against ALS-based AGB predictions retrieved from a traditional non-sequential regression model trained in the same area. Results show that the proposed architecture manages to capture characteristics of the actual data. This suggests that the use of ALS-guided generative models is a promising avenue for AGB prediction from SAR intensity. Further research on this area has the potential of providing both large-scale and low-cost predictions of AGB.
【13】 Adversarial Robustness of Streaming Algorithms through Importance Sampling 标题:基于重要性采样的流传输算法的对抗健壮性
作者:Vladimir Braverman,Avinatan Hassidim,Yossi Matias,Mariano Schain,Sandeep Silwal,Samson Zhou 机构:Google∗, Google†, Google‡, Google§, MIT▽, CMU¶ 链接:https://arxiv.org/abs/2106.14952 摘要:在本文中,我们介绍了用于中心机器学习和算法任务(如回归和聚类)的对抗性鲁棒流式算法,以及它们更一般的对应算法、子空间嵌入、低秩近似和核心集构造。对于回归和其他与数值线性代数相关的任务,我们考虑行到达流模型。我们的结果是基于一个简单但强大的观察,即许多基于重要采样的算法都会产生对抗性鲁棒性,这与基于草图的算法形成了对比,后者在流媒体文献中非常普遍,但受到对抗性攻击。此外,我们还证明了流媒体中著名的归并归约范式具有对抗鲁棒性。由于merge-and-reduce范式允许在流设置中构造核心集,因此我们获得了$k$-均值、$k$-中值、$k$-中心、Bregman聚类、投影聚类、主成分分析(PCA)和非负矩阵分解的鲁棒算法。据我们所知,这是针对这些问题的第一个对手鲁棒结果,但不需要新的算法实现。最后,我们通过实验验证了我们的算法对各种对抗性攻击的鲁棒性,并证明了与之相比,一些常见的现有算法是不鲁棒的(摘要(缩短以满足arXiv限制) 摘要:In this paper, we introduce adversarially robust streaming algorithms for central machine learning and algorithmic tasks, such as regression and clustering, as well as their more general counterparts, subspace embedding, low-rank approximation, and coreset construction. For regression and other numerical linear algebra related tasks, we consider the row arrival streaming model. Our results are based on a simple, but powerful, observation that many importance sampling-based algorithms give rise to adversarial robustness which is in contrast to sketching based algorithms, which are very prevalent in the streaming literature but suffer from adversarial attacks. In addition, we show that the well-known merge and reduce paradigm in streaming is adversarially robust. Since the merge and reduce paradigm allows coreset constructions in the streaming setting, we thus obtain robust algorithms for $k$-means, $k$-median, $k$-center, Bregman clustering, projective clustering, principal component analysis (PCA) and non-negative matrix factorization. To the best of our knowledge, these are the first adversarially robust results for these problems yet require no new algorithmic implementations. Finally, we empirically confirm the robustness of our algorithms on various adversarial attacks and demonstrate that by contrast, some common existing algorithms are not robust. (Abstract shortened to meet arXiv limits)
【14】 A Mixed-Supervision Multilevel GAN Framework for Image Quality Enhancement 标题:一种用于图像质量增强的混合监督多级GaN框架
作者:Uddeshya Upadhyay,Suyash Awate 机构:Computer Science and Engineering, Indian Institute of Tehnology, Bombay, India 备注:MICCAI 2019 链接:https://arxiv.org/abs/2106.15575 摘要:用于图像质量增强的深度神经网络通常需要大量由一对低质量图像及其相应的高质量图像组成的高度精确的训练数据。虽然高质量图像采集通常昂贵且耗时,但中等质量图像的采集速度更快,设备成本更低,并且可以大量获取。因此,我们提出了一种新的生成性对抗网络(GAN),它可以在多个质量级别(例如,高质量和中等质量)上利用训练数据来提高性能,同时限制数据管理的成本。我们将我们的混合监督GAN应用于(i)超分辨率组织病理学图像和(ii)结合超分辨率和外科消烟来增强腹腔镜图像。在大量临床和临床前数据集上的结果表明,我们的混合监督机制优于现有技术。 摘要:Deep neural networks for image quality enhancement typically need large quantities of highly-curated training data comprising pairs of low-quality images and their corresponding high-quality images. While high-quality image acquisition is typically expensive and time-consuming, medium-quality images are faster to acquire, at lower equipment costs, and available in larger quantities. Thus, we propose a novel generative adversarial network (GAN) that can leverage training data at multiple levels of quality (e.g., high and medium quality) to improve performance while limiting costs of data curation. We apply our mixed-supervision GAN to (i) super-resolve histopathology images and (ii) enhance laparoscopy images by combining super-resolution and surgical smoke removal. Results on large clinical and pre-clinical datasets show the benefits of our mixed-supervision GAN over the state of the art.
【15】 GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis 标题:GanSpeech:高保真多说话人语音合成的对抗性训练
作者:Jinhyeok Yang,Jae-Sung Bae,Taejun Bak,Youngik Kim,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15153 摘要:神经网络多说话人文本到语音(TTS)模型的最新进展使得用单一模型生成相当好的语音质量成为可能,并且使得用有限的训练数据合成说话人的语音成为可能。利用多说话人模型对目标说话人数据进行微调可以获得更好的语音质量,但与实际语音样本相比仍存在差距,且模型依赖于说话人。在这项工作中,我们提出了GANSpeech,这是一个高保真的多说话人TTS模型,它采用了非自回归多说话人TTS模型的对抗性训练方法。此外,本文还提出了一种简单而有效的对抗性训练中特征匹配丢失的自动缩放方法。在主观听力测试中,GANSpeech显著优于基线多说话人FastSpeech和FastSpeech2模型,并且显示出比特定说话人微调FastSpeech2更好的MOS分数。 摘要:Recent advances in neural multi-speaker text-to-speech (TTS) models have enabled the generation of reasonably good speech quality with a single model and made it possible to synthesize the speech of a speaker with limited training data. Fine-tuning to the target speaker data with the multi-speaker model can achieve better quality, however, there still exists a gap compared to the real speech sample and the model depends on the speaker. In this work, we propose GANSpeech, which is a high-fidelity multi-speaker TTS model that adopts the adversarial training method to a non-autoregressive multi-speaker TTS model. In addition, we propose simple but efficient automatic scaling methods for feature matching loss used in adversarial training. In the subjective listening tests, GANSpeech significantly outperformed the baseline multi-speaker FastSpeech and FastSpeech2 models, and showed a better MOS score than the speaker-specific fine-tuned FastSpeech2.
半/弱/无/有监督|不确定性|主动学习(6篇)
【1】 As easy as APC: Leveraging self-supervised learning in the context of time series classification with varying levels of sparsity and severe class imbalance 标题:与APC一样简单:在具有不同级别稀疏性和严重类别失衡的时间序列分类环境中利用自我监督学习
作者:Fiorella Wever,T. Anderson Keller,Victor Garcia,Laura Symul 机构: 1University of Amsterdam, Germany 3Department of Statistics, Stan-ford University 链接:https://arxiv.org/abs/2106.15577 摘要:高水平的稀疏性和强的类不平衡是普遍存在的挑战,往往同时出现在现实世界的时间序列数据。虽然大多数方法分别处理每个问题,但我们提出的方法同时处理这两个问题,同时对数据施加较少的假设。在这项工作中,我们提出利用自监督学习方法,特别是自回归预测编码(APC),来学习在缺失数据和类别不平衡的情况下时间序列数据的相关隐藏表示。我们使用GRU或GRU-D编码器在两个真实数据集上应用APC,并表明在所有设置下应用APC的一步超前预测可以改善分类结果。事实上,通过应用GRU-D-APC,我们在Physionet基准上获得了最先进的AUPRC结果。 摘要:High levels of sparsity and strong class imbalance are ubiquitous challenges that are often presented simultaneously in real-world time series data. While most methods tackle each problem separately, our proposed approach handles both in conjunction, while imposing fewer assumptions on the data. In this work, we propose leveraging a self-supervised learning method, specifically Autoregressive Predictive Coding (APC), to learn relevant hidden representations of time series data in the context of both missing data and class imbalance. We apply APC using either a GRU or GRU-D encoder on two real-world datasets, and show that applying one-step-ahead prediction with APC improves the classification results in all settings. In fact, by applying GRU-D - APC, we achieve state-of-the-art AUPRC results on the Physionet benchmark.
【2】 Semi-supervised learning with Bayesian Confidence Propagation Neural Network 标题:基于贝叶斯置信传播神经网络的半监督学习
作者:Naresh Balaji Ravichandran,Anders Lansner,Pawel Herman 机构:- Computational Brain Science Lab, KTH Royal Institute of Technology, Stockholm, Sweden, - Department of Mathematics, Stockholm University 链接:https://arxiv.org/abs/2106.15546 摘要:从不使用或很少使用标签的数据中学习内部表示对于机器学习研究非常有用,因为它允许使用大量未标记的数据。在这项工作中,我们使用贝叶斯置信传播神经网络(BCPNN)模型作为大脑皮层的生物学合理模型。最近的研究表明,这些网络可以使用局部贝叶斯Hebbian学习规则从数据中学习有用的内部表示。在这项工作中,我们通过引入和比较不同的分类器来展示如何在半监督环境中利用这种表示。我们还评估和比较了这种网络与其他流行的半监督分类器。 摘要:Learning internal representations from data using no or few labels is useful for machine learning research, as it allows using massive amounts of unlabeled data. In this work, we use the Bayesian Confidence Propagation Neural Network (BCPNN) model developed as a biologically plausible model of the cortex. Recent work has demonstrated that these networks can learn useful internal representations from data using local Bayesian-Hebbian learning rules. In this work, we show how such representations can be leveraged in a semi-supervised setting by introducing and comparing different classifiers. We also evaluate and compare such networks with other popular semi-supervised classifiers.
【3】 Self-Contrastive Learning 标题:自我对比学习
作者:Sangmin Bae,Sungnyun Kim,Jongwoo Ko,Gihun Lee,Seungjong Noh,Se-Young Yun 机构:KAIST, SK hynix 链接:https://arxiv.org/abs/2106.15499 摘要:本文提出了一种新的对比学习框架,称为自对比学习(SelfCon),它是在网络不同层次的多个输出中进行自我对比的学习。我们确认SelfCon损失保证了中间表示和最后表示之间的互信息(MI)的下界。此外,我们通过各种MI估计的实证研究表明,SelfCon损失与MI的增加和更好的分类性能高度相关。在我们的实验中,SelfCon优于有监督对比学习(SupCon),不需要多视角的批处理,并且计算成本更低。特别是在ResNet-18上,CIFAR-100数据集的top-1分类准确率为76.45%,分别比SupCon和交叉熵损失高2.87%和4.36%。我们发现,减少消失梯度和过度拟合的问题,使我们的方法优于同行。 摘要:This paper proposes a novel contrastive learning framework, coined as Self-Contrastive (SelfCon) Learning, that self-contrasts within multiple outputs from the different levels of a network. We confirmed that SelfCon loss guarantees the lower bound of mutual information (MI) between the intermediate and last representations. Besides, we empirically showed, via various MI estimators, that SelfCon loss highly correlates to the increase of MI and better classification performance. In our experiments, SelfCon surpasses supervised contrastive (SupCon) learning without the need for a multi-viewed batch and with the cheaper computational cost. Especially on ResNet-18, we achieved top-1 classification accuracy of 76.45% for the CIFAR-100 dataset, which is 2.87% and 4.36% higher than SupCon and cross-entropy loss, respectively. We found that mitigating both vanishing gradient and overfitting issue makes our method outperform the counterparts.
【4】 Effective Evaluation of Deep Active Learning on Image Classification Tasks 标题:深度主动学习在图像分类任务中的有效性评价
作者:Nathan Beck,Durga Sivasubramanian,Apurva Dani,Ganesh Ramakrishnan,Rishabh Iyer 机构:The University of Texas at Dallas, Indian Institute of Technology, Bombay, AIFY Innovation Labs 备注:9 pages in main paper, 6 figures in main paper, 3 tables in main paper. 23 pages in total, 15 figures in total, 14 tables in total. Submitted to and currently under review for NeurIPS 2021 链接:https://arxiv.org/abs/2106.15324 摘要:为了提高深度学习的效率,越来越多的论文研究了基于深度模型的主动学习。然而,在普遍的实验环境中存在着一些问题,主要是由于缺乏统一的实施和基准。当前文献中存在的问题包括:对不同AL算法性能的观察有时相互矛盾,无意中排除了重要的泛化方法,如用于优化的数据扩充和SGD,缺乏对评估方面的研究,如AL的标记效率,对于AL优于随机抽样(RS)的情况,很少或没有明确的说明。在这项工作中,我们提出了一个统一的重新实现的国家最先进的AL算法的背景下的图像分类,我们仔细研究这些问题作为有效的评估方面。在积极的一面,我们表明,与使用数据扩充的RS相比,AL技术的标签效率提高了2到4倍。令人惊讶的是,当包含数据扩充时,使用BADGE(一种最先进的方法)与简单的不确定性采样相比,不再有一致的收益。然后,我们仔细分析现有方法在不同冗余度和每个类的示例数下的性能。最后,我们提供了一些见解供AL从业者在未来的工作中考虑,例如AL批量大小的影响、初始化的影响、每轮重新训练新模型的重要性以及其他见解。 摘要:With the goal of making deep learning more label-efficient, a growing number of papers have been studying active learning (AL) for deep models. However, there are a number of issues in the prevalent experimental settings, mainly stemming from a lack of unified implementation and benchmarking. Issues in the current literature include sometimes contradictory observations on the performance of different AL algorithms, unintended exclusion of important generalization approaches such as data augmentation and SGD for optimization, a lack of study of evaluation facets like the labeling efficiency of AL, and little or no clarity on the scenarios in which AL outperforms random sampling (RS). In this work, we present a unified re-implementation of state-of-the-art AL algorithms in the context of image classification, and we carefully study these issues as facets of effective evaluation. On the positive side, we show that AL techniques are 2x to 4x more label-efficient compared to RS with the use of data augmentation. Surprisingly, when data augmentation is included, there is no longer a consistent gain in using BADGE, a state-of-the-art approach, over simple uncertainty sampling. We then do a careful analysis of how existing approaches perform with varying amounts of redundancy and number of examples per class. Finally, we provide several insights for AL practitioners to consider in future work, such as the effect of the AL batch size, the effect of initialization, the importance of retraining a new model at every round, and other insights.
【5】 SCARF: Self-Supervised Contrastive Learning using Random Feature Corruption 标题:SCARF:基于随机特征破坏的自监督对比学习
作者:Dara Bahri,Heinrich Jiang,Yi Tay,Donald Metzler 机构:Google Research 链接:https://arxiv.org/abs/2106.15147 摘要:自监督对比表征学习在视觉和自然语言领域取得了令人难以置信的成功,能够以数量级更少的标记数据实现最先进的性能。然而,这些方法是特定领域的,很少有人在实际的表格数据集上利用这种技术。我们提出了SCARF,一种简单的,广泛应用的对比学习技术,其中观点是通过腐蚀一个随机的特征子集形成的。当应用于从OpenML-CC18基准测试得到的69个真实世界、表格分类数据集上的预训练深度神经网络时,SCARF不仅在全监督环境下提高了分类精度,而且在标签噪声和半监督环境下也提高了分类精度,其中只有一小部分可用的训练数据被标记。我们表明,围巾补充现有的战略和优于替代品,如自动编码器。我们进行全面的烧蚀,详细说明了一系列因素的重要性。 摘要:Self-supervised contrastive representation learning has proved incredibly successful in the vision and natural language domains, enabling state-of-the-art performance with orders of magnitude less labeled data. However, such methods are domain-specific and little has been done to leverage this technique on real-world tabular datasets. We propose SCARF, a simple, widely-applicable technique for contrastive learning, where views are formed by corrupting a random subset of features. When applied to pre-train deep neural networks on the 69 real-world, tabular classification datasets from the OpenML-CC18 benchmark, SCARF not only improves classification accuracy in the fully-supervised setting but does so also in the presence of label noise and in the semi-supervised setting where only a fraction of the available training data is labeled. We show that SCARF complements existing strategies and outperforms alternatives like autoencoders. We conduct comprehensive ablations, detailing the importance of a range of factors.
【6】 Cross-domain error minimization for unsupervised domain adaptation 标题:无监督域自适应的跨域误差最小化算法
作者:Yuntao Du,Yinghao Chen,Fengli Cui,Xiaowen Zhang,Chongjun Wang 机构:State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing University, Nanjing , China 备注:Accepted by DASFAA 2021 链接:https://arxiv.org/abs/2106.15057 摘要:无监督领域自适应的目的是将知识从标记的源领域转移到未标记的目标领域。以往的方法侧重于学习领域不变特征,以减少特征分布之间的差异,同时最小化源误差,取得了显著的进展。然而,最近提出的一个理论表明,这样的策略是不够的,一个成功的领域适应。结果表明,除了较小的源误差外,特征分布之间的差异和标记函数之间的差异在域之间都应该很小。标记函数之间的差异本质上是跨域误差,而现有的方法忽略了这些误差。为了解决这一问题,本文提出了一种新的方法,将所有目标集成到一个统一的优化框架中。此外,以往方法中广泛使用的错误伪标记会导致学习过程中的错误积累。为了解决这一问题,除了利用源分类器外,还利用目标域的结构信息来获取伪标签,并提出了一种基于课程学习的策略,在训练过程中选择具有更精确伪标签的目标样本。通过综合实验,验证了该方法的有效性。 摘要:Unsupervised domain adaptation aims to transfer knowledge from a labeled source domain to an unlabeled target domain. Previous methods focus on learning domain-invariant features to decrease the discrepancy between the feature distributions as well as minimizing the source error and have made remarkable progress. However, a recently proposed theory reveals that such a strategy is not sufficient for a successful domain adaptation. It shows that besides a small source error, both the discrepancy between the feature distributions and the discrepancy between the labeling functions should be small across domains. The discrepancy between the labeling functions is essentially the cross-domain errors which are ignored by existing methods. To overcome this issue, in this paper, a novel method is proposed to integrate all the objectives into a unified optimization framework. Moreover, the incorrect pseudo labels widely used in previous methods can lead to error accumulation during learning. To alleviate this problem, the pseudo labels are obtained by utilizing structural information of the target domain besides source classifier and we propose a curriculum learning based strategy to select the target samples with more accurate pseudo-labels during training. Comprehensive experiments are conducted, and the results validate that our approach outperforms state-of-the-art methods.
迁移|Zero/Few/One-Shot|自适应(4篇)
【1】 Zoo-Tuning: Adaptive Transfer from a Zoo of Models 标题:动物园调整:模型动物园的适应性迁移
作者:Yang Shu,Zhi Kou,Zhangjie Cao,Jianmin Wang,Mingsheng Long 机构: A standard deep transfer learning paradigm is toEqual contribution 1School of Software, TsinghuaUniversity 备注:Accepted by ICML 2021 链接:https://arxiv.org/abs/2106.15434 摘要:随着各种大规模数据集上深度网络的发展,可以得到大量的预训练模型。当从模型库中迁移时,将经典的基于单一模型的迁移学习方法应用到每个源模型中,计算量大,不能充分利用模型库中丰富的知识。我们提出\emph{Zoo Tuning}来解决这些挑战,它学习自适应地将预训练模型的参数转移到目标任务。通过可学习的信道对齐层和自适应聚合层,Zoo-Tuning\emph{自适应聚合信道对齐的预训练参数}导出目标模型,通过同时使多个源模型适应下游任务来促进知识转移。自适应聚合大大减少了训练和推理的计算量。我们进一步提出了利用批次平均选通值的时间集合来降低推理时的存储成本。我们评估了我们的方法对各种任务,包括强化学习,图像分类,和面部地标检测。实验结果表明,本文提出的自适应迁移学习方法能够更有效地从一个模型库中迁移知识。 摘要:With the development of deep networks on various large-scale datasets, a large zoo of pretrained models are available. When transferring from a model zoo, applying classic single-model based transfer learning methods to each source model suffers from high computational burden and cannot fully utilize the rich knowledge in the zoo. We propose \emph{Zoo-Tuning} to address these challenges, which learns to adaptively transfer the parameters of pretrained models to the target task. With the learnable channel alignment layer and adaptive aggregation layer, Zoo-Tuning \emph{adaptively aggregates channel aligned pretrained parameters} to derive the target model, which promotes knowledge transfer by simultaneously adapting multiple source models to downstream tasks. The adaptive aggregation substantially reduces the computation cost at both training and inference. We further propose lite Zoo-Tuning with the temporal ensemble of batch average gating values to reduce the storage cost at the inference time. We evaluate our approach on a variety of tasks, including reinforcement learning, image classification, and facial landmark detection. Experiment results demonstrate that the proposed adaptive transfer learning approach can transfer knowledge from a zoo of models more effectively and efficiently.
【2】 High-dimensional separability for one- and few-shot learning 标题:单次和少次学习的高维可分性
作者:Alexander N. Gorban,Bogdan Grechuk,Evgeny M. Mirkes,Sergey V. Stasenko,Ivan Y. Tyukin 机构:Department of Mathematics, University of Leicester, Leicester, UK, Lobachevsky University, Nizhni Novgorod, Russia, Department of Geoscience and Petroleum, Norwegian University of Science and Technology, , Received: date; Accepted: date; Published: date 链接:https://arxiv.org/abs/2106.15416 摘要:这项工作是由一个实际的问题,人工智能(AI)错误的纠正。对一个大型人工智能系统进行系统的再训练几乎是不可能的。为了解决这个问题,专门的外部设备,校正器,被开发出来。他们应该提供快速和非迭代的系统修复,而不需要修改遗留的人工智能系统。AI校正器的一个通用部分是一个分类器,它应该将不希望的和错误的行为从正常操作中分离出来。训练这样的分类器是一个巨大的挑战,在核心的一个和少数镜头学习方法。简单方法的有效性是基于显著的维度缩减或维度效应的支持。随机可分性是维数现象的一个优点,它允许一次或几次错误纠正:在高维数据集中,在广泛的假设下,每个点都可以通过简单而健壮的线性判别法从集合的其余部分中分离出来。引入了数据域的层次结构,其中每个数据簇都有一个细粒度的内部结构等,建立并证明了新的细粒度数据分布的随机分离定理。在数据空间模式紧嵌入的假设下,证明了无限维极限下的分离定理。提出了一种新的人工智能系统多重校正方法,并以深度卷积神经网络预测误差和学习新类对象为例进行了说明。 摘要:This work is driven by a practical question, corrections of Artificial Intelligence (AI) errors. Systematic re-training of a large AI system is hardly possible. To solve this problem, special external devices, correctors, are developed. They should provide quick and non-iterative system fix without modification of a legacy AI system. A common universal part of the AI corrector is a classifier that should separate undesired and erroneous behavior from normal operation. Training of such classifiers is a grand challenge at the heart of the one- and few-shot learning methods. Effectiveness of one- and few-short methods is based on either significant dimensionality reductions or the blessing of dimensionality effects. Stochastic separability is a blessing of dimensionality phenomenon that allows one-and few-shot error correction: in high-dimensional datasets under broad assumptions each point can be separated from the rest of the set by simple and robust linear discriminant. The hierarchical structure of data universe is introduced where each data cluster has a granular internal structure, etc. New stochastic separation theorems for the data distributions with fine-grained structure are formulated and proved. Separation theorems in infinite-dimensional limits are proven under assumptions of compact embedding of patterns into data space. New multi-correctors of AI systems are presented and illustrated with examples of predicting errors and learning new classes of objects by a deep convolutional neural network.
【3】 Adaptive Sample Selection for Robust Learning under Label Noise 标题:标签噪声下鲁棒学习的自适应样本选择
作者:Deep Patel,P. S. Sastry 机构:Department of Electrical Engineering, Indian Institute of Science, Bangalore, Karnataka 备注:Preprint. Under review 链接:https://arxiv.org/abs/2106.15292 摘要:深度神经网络(DNNs)已被证明在有噪声标记的数据存在时容易被记忆或过度拟合。针对这种噪声数据下的鲁棒学习问题,提出了几种算法。一个突出的算法类依赖于样本选择策略,课程学习的动机。例如,许多算法使用“小损失技巧”,其中选择损失值低于某个阈值的部分样本进行训练。这些算法对这些阈值非常敏感,很难确定或学习这些阈值。通常,这些算法还需要标签噪声率等信息,而这些信息在实际中通常是不可用的。在本文中,我们提出了一个数据相关的自适应样本选择策略,它只依赖于给定小批量的批次统计信息来提供对标签噪声的鲁棒性。该算法不需要任何额外的样本选择超参数,不需要任何噪声率信息,也不需要访问带有干净标签的单独数据。我们在基准测试数据集上验证了算法的有效性。 摘要:Deep Neural Networks (DNNs) have been shown to be susceptible to memorization or overfitting in the presence of noisily labelled data. For the problem of robust learning under such noisy data, several algorithms have been proposed. A prominent class of algorithms rely on sample selection strategies, motivated by curriculum learning. For example, many algorithms use the `small loss trick' wherein a fraction of samples with loss values below a certain threshold are selected for training. These algorithms are sensitive to such thresholds, and it is difficult to fix or learn these thresholds. Often, these algorithms also require information such as label noise rates which are typically unavailable in practice. In this paper, we propose a data-dependent, adaptive sample selection strategy that relies only on batch statistics of a given mini-batch to provide robustness against label noise. The algorithm does not have any additional hyperparameters for sample selection, does not need any information on noise rates, and does not need access to separate data with clean labels. We empirically demonstrate the effectiveness of our algorithm on benchmark datasets.
【4】 Test-Time Adaptation to Distribution Shift by Confidence Maximization and Input Transformation 标题:基于置信度最大化和输入变换的分布漂移测试时间自适应
作者:Chaithanya Kumar Mummadi,Robin Hutmacher,Kilian Rambach,Evgeny Levinkov,Thomas Brox,Jan Hendrik Metzen 机构:University of Freiburg, Bosch Center for Artificial Intelligence 备注:16 pages, 5 figures, 7 tables 链接:https://arxiv.org/abs/2106.14999 摘要:深度神经网络往往表现出较差的性能数据,是不太可能在列车时间的数据分布,例如数据受腐蚀的影响。以前的工作表明,测试时间适应数据转移,例如使用熵最小化,有效地提高了这种转移分布的性能。本文主要研究完全测试时间自适应设置,其中只需要来自目标分布的未标记数据。这允许适应任意预训练网络。具体来说,我们提出了一种新的损失,通过解决早熟收敛和熵最小化的不稳定性来提高测试时间的适应性。这是通过将熵替换为非饱和替代项,并添加基于分批熵最大化的多样性正则化器来实现的,该正则化器可防止收敛到平凡的折叠解。此外,我们建议预先在网路中加入一个输入转换模组,可以部分撤销测试时间分布的位移。令人惊讶的是,这种预处理可以在没有任何目标域标签或源域数据的情况下,以端到端的方式仅使用完全测试时间自适应损失来学习。我们的研究表明,我们的方法在提高公开可用的预训练图像分类器对像ImageNet-C这样具有挑战性的基准上常见的损坏的鲁棒性方面优于以前的工作。 摘要:Deep neural networks often exhibit poor performance on data that is unlikely under the train-time data distribution, for instance data affected by corruptions. Previous works demonstrate that test-time adaptation to data shift, for instance using entropy minimization, effectively improves performance on such shifted distributions. This paper focuses on the fully test-time adaptation setting, where only unlabeled data from the target distribution is required. This allows adapting arbitrary pretrained networks. Specifically, we propose a novel loss that improves test-time adaptation by addressing both premature convergence and instability of entropy minimization. This is achieved by replacing the entropy by a non-saturating surrogate and adding a diversity regularizer based on batch-wise entropy maximization that prevents convergence to trivial collapsed solutions. Moreover, we propose to prepend an input transformation module to the network that can partially undo test-time distribution shifts. Surprisingly, this preprocessing can be learned solely using the fully test-time adaptation loss in an end-to-end fashion without any target domain labels or source domain data. We show that our approach outperforms previous work in improving the robustness of publicly available pretrained image classifiers to common corruptions on such challenging benchmarks as ImageNet-C.
强化学习(3篇)
【1】 Globally Optimal Hierarchical Reinforcement Learning for Linearly-Solvable Markov Decision Processes 标题:线性可解马尔可夫决策过程的全局最优分层强化学习
作者:Guillermo Infante,Anders Jonsso,Vicenç Gómez 机构:DTIC, Universitat Pompeu Fabra 链接:https://arxiv.org/abs/2106.15380 摘要:在这项工作中,我们提出了一种新的方法来分层强化学习线性可解马尔可夫决策过程。我们的方法假设状态空间是分区的,子任务在分区之间移动。我们在多个抽象层次上表示值函数,并使用子任务的组合性来估计每个分区中状态的最佳值。策略是在这些最优值估计上隐式定义的,而不是在子任务之间分解。因此,我们的方法可以学习全局最优策略,并且不会受到高层决策的非平稳性的影响。如果多个分区具有等效的动态性,则可以共享这些分区的子任务。如果边界状态集小于整个状态空间,那么我们的方法的样本复杂度将明显小于平面学习者的样本复杂度,并且我们在几个实验中验证了这一点。 摘要:In this work we present a novel approach to hierarchical reinforcement learning for linearly-solvable Markov decision processes. Our approach assumes that the state space is partitioned, and the subtasks consist in moving between the partitions. We represent value functions on several levels of abstraction, and use the compositionality of subtasks to estimate the optimal values of the states in each partition. The policy is implicitly defined on these optimal value estimates, rather than being decomposed among the subtasks. As a consequence, our approach can learn the globally optimal policy, and does not suffer from the non-stationarity of high-level decisions. If several partitions have equivalent dynamics, the subtasks of those partitions can be shared. If the set of boundary states is smaller than the entire state space, our approach can have significantly smaller sample complexity than that of a flat learner, and we validate this empirically in several experiments.
【2】 DRILL-- Deep Reinforcement Learning for Refinement Operators in \mathcal{ALC}
作者:Caglar Demir,Axel-Cyrille Ngonga Ngomo 机构:Data Science Research Group, Paderborn University 链接:https://arxiv.org/abs/2106.15373 摘要:基于求精算子的方法已成功应用于RDF知识图的类表达式学习。这些方法往往需要探索大量的概念,以找到适当的假设。这种需求可以说是源于当前的方法依赖于短视的启发式函数来引导他们的搜索通过一个无限的概念空间。反过来,深度强化学习提供了有效的手段来解决近视估计多少折扣累积未来奖励国家承诺。在这项工作中,我们利用深度强化学习来加速$\mathcal{ALC}$中概念的学习,提出了一种新的类表达式学习方法DRILL,它使用卷积深度Q学习模型来指导其搜索。凭借其体系结构,DRILL能够在标准硬件上计算出每秒超过$10^3$类表达式的预期折扣累积未来回报。我们根据最先进的方法对四个基准数据集进行了评估。我们的结果表明,在所有基准数据集上,DRILL收敛到目标状态的速度至少比最先进的模型快2.7$\倍。我们提供了我们方法的开源实现,包括训练和评估脚本以及预先训练的模型。 摘要:Approaches based on refinement operators have been successfully applied to class expression learning on RDF knowledge graphs. These approaches often need to explore a large number of concepts to find adequate hypotheses. This need arguably stems from current approaches relying on myopic heuristic functions to guide their search through an infinite concept space. In turn, deep reinforcement learning provides effective means to address myopia by estimating how much discounted cumulated future reward states promise. In this work, we leverage deep reinforcement learning to accelerate the learning of concepts in $\mathcal{ALC}$ by proposing DRILL -- a novel class expression learning approach that uses a convolutional deep Q-learning model to steer its search. By virtue of its architecture, DRILL is able to compute the expected discounted cumulated future reward of more than $10^3$ class expressions in a second on standard hardware. We evaluate DRILL on four benchmark datasets against state-of-the-art approaches. Our results suggest that DRILL converges to goal states at least 2.7$\times$ faster than state-of-the-art models on all benchmark datasets. We provide an open-source implementation of our approach, including training and evaluation scripts as well as pre-trained models.
【3】 Modularity in Reinforcement Learning via Algorithmic Independence in Credit Assignment 标题:基于信用分配算法无关性的强化学习模块性
作者:Michael Chang,Sidhant Kaushik,Sergey Levine,Thomas L. Griffiths 机构:Equal contribution 1Department of Computer Science, USA 2Department of ComputerScience, Princeton University 备注:Long Presentation at the Thirty-eighth International Conference on Machine Learning (ICML) 2021. 21 pages, 11 figures 链接:https://arxiv.org/abs/2106.14993 摘要:许多迁移问题需要重新使用以前的最优决策来解决新的任务,这表明需要学习算法来修改选择特定动作的机制,而不是选择其他动作的机制。然而,如何实现这种模块化的信用分配,目前还没有一种形式主义或理论。为了回答这个问题,我们定义模块化信用分配作为一个约束,以最小化不同决策的反馈信号之间的算法互信息。通过对学习算法本身进行因果分析,我们引入了我们称之为模块化的准则来检验学习算法是否满足这个约束。我们将最近提出的社会决策框架推广为比Markov决策过程更细粒度的形式主义,以证明对于不包含循环的决策序列,某些单步时间差动作值方法满足这个准则,而所有的策略梯度方法则不满足。经验证据表明,这种行动价值方法比政策梯度方法对转移问题的样本效率更高,转移问题只需要对先前最优决策序列进行稀疏的改变。 摘要:Many transfer problems require re-using previously optimal decisions for solving new tasks, which suggests the need for learning algorithms that can modify the mechanisms for choosing certain actions independently of those for choosing others. However, there is currently no formalism nor theory for how to achieve this kind of modular credit assignment. To answer this question, we define modular credit assignment as a constraint on minimizing the algorithmic mutual information among feedback signals for different decisions. We introduce what we call the modularity criterion for testing whether a learning algorithm satisfies this constraint by performing causal analysis on the algorithm itself. We generalize the recently proposed societal decision-making framework as a more granular formalism than the Markov decision process to prove that for decision sequences that do not contain cycles, certain single-step temporal difference action-value methods meet this criterion while all policy-gradient methods do not. Empirical evidence suggests that such action-value methods are more sample efficient than policy-gradient methods on transfer problems that require only sparse changes to a sequence of previously optimal decisions.
元学习(3篇)
【1】 A Representation Learning Perspective on the Importance of Train-Validation Splitting in Meta-Learning 标题:从表征学习的角度看元学习中训练验证分离的重要性
作者:Nikunj Saunshi,Arushi Gupta,Wei Hu 机构:Department of Computer Science, Princeton University, In proceedings of ICML 备注:In proceedings of ICML 2021 链接:https://arxiv.org/abs/2106.15615 摘要:元学习中的一种有效方法是利用多个“训练任务”来学习模型参数的良好初始化,通过从初始化中微调,可以帮助解决很少样本的看不见的“测试任务”。虽然在实践中取得了成功,但对这些方法的理论认识还很有限。这项工作研究了这些方法的一个重要方面:在元训练期间将每个任务的数据分解为训练(支持)和验证(查询)集。受最近工作的启发(Raghu et al.,2020),我们从表征学习的角度来看待这种元学习方法,并认为序列验证分割鼓励学习的表征为低秩而不损害表达能力,而非鼓励高秩表征的非分割变体。由于样本效率得益于较低的rankness,分裂策略将需要很少的样本来解决看不见的测试任务。我们提出了在子空间元学习实例上形式化这种线性表示学习思想的理论结果,并在仿真和标准元学习基准上实验验证了这种分裂的实际好处。 摘要:An effective approach in meta-learning is to utilize multiple "train tasks" to learn a good initialization for model parameters that can help solve unseen "test tasks" with very few samples by fine-tuning from this initialization. Although successful in practice, theoretical understanding of such methods is limited. This work studies an important aspect of these methods: splitting the data from each task into train (support) and validation (query) sets during meta-training. Inspired by recent work (Raghu et al., 2020), we view such meta-learning methods through the lens of representation learning and argue that the train-validation split encourages the learned representation to be low-rank without compromising on expressivity, as opposed to the non-splitting variant that encourages high-rank representations. Since sample efficiency benefits from low-rankness, the splitting strategy will require very few samples to solve unseen test tasks. We present theoretical results that formalize this idea for linear representation learning on a subspace meta-learning instance, and experimentally verify this practical benefit of splitting in simulations and on standard meta-learning benchmarks.
【2】 Fast Training of Neural Lumigraph Representations using Meta Learning 标题:基于元学习的神经图表征的快速训练
作者:Alexander W. Bergman,Petr Kellnhofer,Gordon Wetzstein 机构:Stanford University, computationalimaging.orgpublicationsmetanlr 备注:Project website: this http URL 链接:https://arxiv.org/abs/2106.14942 摘要:新颖的视图合成是机器学习和计算机视觉中一个长期存在的问题。最近,神经场景表示和渲染技术的发展取得了重大进展,这些技术可以从任意视图合成照片级真实感图像。但是,这些表示法的训练速度非常慢,而且渲染速度通常也很慢。受基于图像的绘制的神经变体的启发,我们开发了一种新的神经绘制方法,目的是快速学习高质量的表示,也可以实时绘制。我们的方法MetaNLR++,通过使用神经形状表示和基于2dcnn的图像特征提取、聚集和重投影的独特组合来实现这一点。为了将表征收敛时间缩短到几分钟,我们利用元学习来学习神经形状和图像特征的先验知识,从而加速训练。然后,可以使用传统的图形技术提取优化的形状和图像特征,并进行实时渲染。我们证明了MetaNLR++实现类似或更好的新视图合成结果的时间比竞争方法所需的时间要短。 摘要:Novel view synthesis is a long-standing problem in machine learning and computer vision. Significant progress has recently been made in developing neural scene representations and rendering techniques that synthesize photorealistic images from arbitrary views. These representations, however, are extremely slow to train and often also slow to render. Inspired by neural variants of image-based rendering, we develop a new neural rendering approach with the goal of quickly learning a high-quality representation which can also be rendered in real-time. Our approach, MetaNLR++, accomplishes this by using a unique combination of a neural shape representation and 2D CNN-based image feature extraction, aggregation, and re-projection. To push representation convergence times down to minutes, we leverage meta learning to learn neural shape and image feature priors which accelerate training. The optimized shape and image features can then be extracted using traditional graphics techniques and rendered in real time. We show that MetaNLR++ achieves similar or better novel view synthesis results in a fraction of the time that competing methods require.
【3】 Meta-learning for Matrix Factorization without Shared Rows or Columns 标题:无共享行或列的矩阵因式分解的元学习
作者:Tomoharu Iwata 机构:NTT Communication Science Laboratories 链接:https://arxiv.org/abs/2106.15133 摘要:我们提出了一种方法,元学习知识的矩阵分解从各种矩阵,并利用这些知识分解看不见的矩阵。该方法使用一个以矩阵为输入的神经网络,生成给定矩阵的分解矩阵的先验分布。神经网络是元学习的,当分解矩阵通过最大后验概率(MAP)估计适应每个矩阵时,期望的插补误差最小。我们使用梯度下降法进行MAP估计,这使得我们能够通过梯度下降步骤来反向传播期望的插补误差,以更新神经网络参数,因为每个梯度下降步骤都是以封闭形式编写的,并且是可微的。该方法可以从矩阵中进行元学习,即使矩阵的行和列不共享,并且矩阵的大小不同。在三个用户项目评分数据集的实验中,我们证明了我们提出的方法在经过不同的矩阵训练后,可以从不可见矩阵中的有限个观察值中填充缺失值。 摘要:We propose a method that meta-learns a knowledge on matrix factorization from various matrices, and uses the knowledge for factorizing unseen matrices. The proposed method uses a neural network that takes a matrix as input, and generates prior distributions of factorized matrices of the given matrix. The neural network is meta-learned such that the expected imputation error is minimized when the factorized matrices are adapted to each matrix by a maximum a posteriori (MAP) estimation. We use a gradient descent method for the MAP estimation, which enables us to backpropagate the expected imputation error through the gradient descent steps for updating neural network parameters since each gradient descent step is written in a closed form and is differentiable. The proposed method can meta-learn from matrices even when their rows and columns are not shared, and their sizes are different from each other. In our experiments with three user-item rating datasets, we demonstrate that our proposed method can impute the missing values from a limited number of observations in unseen matrices after being trained with different matrices.
医学相关(2篇)
【1】 Sounds of COVID-19: exploring realistic performance of audio-based digital testing 标题:冠状病毒之声:探索基于音频的数字测试的真实性能
作者:Jing Han,Tong Xia,Dimitris Spathis,Erika Bondareva,Chloë Brown,Jagmohan Chauhan,Ting Dang,Andreas Grammenos,Apinan Hasthanasombat,Andres Floto,Pietro Cicuta,Cecilia Mascolo 机构:Chlo¨e Brown,†, Department of Computer Science and Technology, University of Cambridge, UK, Department of Medicine, University of Cambridge, UK, Department of Physics, University of Cambridge, UK, ECS, University of Southampton, UK 链接:https://arxiv.org/abs/2106.15523 摘要:研究人员一直在为如何有效、经济、大规模地鉴别冠状病毒病(COVID-19)病例而斗争。最近的工作表明,基于音频的方法(收集呼吸音频数据(咳嗽、呼吸和声音)如何用于测试,但是缺乏对偏见和方法决定如何影响这些工具在实践中的性能的探索。在本文中,我们探讨了基于音频的COVID-19数字测试的真实性能。为了调查这一点,我们通过移动应用程序收集了大量的众包呼吸音频数据集,以及最新的COVID-19测试结果和症状作为基本事实。在收集到的数据集中,我们从2478名参与者中选取了5240个样本,并将其分成不同的参与者独立集进行模型开发和验证。其中,我们控制了潜在的混杂因素(如人口统计学和语言)。无偏模型以从呼吸、咳嗽和语音信号中提取的特征作为预测因子,AUC-ROC为0.71(95\%CI:0.65$-$0.77)。我们进一步探讨不同的不平衡分布,以显示偏见和参与者分裂如何影响绩效。最后,我们讨论了如何将所提出的现实模型整合到临床实践中,在人群规模上实现连续、普遍、可持续和负担得起的检测。 摘要:Researchers have been battling with the question of how we can identify Coronavirus disease (COVID-19) cases efficiently, affordably and at scale. Recent work has shown how audio based approaches, which collect respiratory audio data (cough, breathing and voice) can be used for testing, however there is a lack of exploration of how biases and methodological decisions impact these tools' performance in practice. In this paper, we explore the realistic performance of audio-based digital testing of COVID-19. To investigate this, we collected a large crowdsourced respiratory audio dataset through a mobile app, alongside recent COVID-19 test result and symptoms intended as a ground truth. Within the collected dataset, we selected 5,240 samples from 2,478 participants and split them into different participant-independent sets for model development and validation. Among these, we controlled for potential confounding factors (such as demographics and language). The unbiased model takes features extracted from breathing, coughs, and voice signals as predictors and yields an AUC-ROC of 0.71 (95\% CI: 0.65$-$0.77). We further explore different unbalanced distributions to show how biases and participant splits affect performance. Finally, we discuss how the realistic model presented could be integrated in clinical practice to realize continuous, ubiquitous, sustainable and affordable testing at population scale.
【2】 Data augmentation for deep learning based accelerated MRI reconstruction with limited data 标题:基于深度学习的有限数据MRI加速重建的数据增强
作者:Zalan Fabian,Reinhard Heckel,Mahdi Soltanolkotabi 备注:27 pages, 19 figures, to be published in ICML2021 链接:https://arxiv.org/abs/2106.14947 摘要:深度神经网络已经成为图像恢复和重建任务中非常成功的工具。这些网络通常是端到端训练的,直接从图像的噪声或损坏的测量值重建图像。为了达到最先进的性能,对大型和多样化的图像集的训练被认为是至关重要的。然而,收集大量的训练图像通常是困难和/或昂贵的。受数据增强(DA)在分类问题上取得成功的启发,本文提出了一种用于MRI加速重建的数据增强管道,并研究了它在各种情况下减少所需训练数据的有效性。我们的DA管道,MRAugment,是专门设计来利用医学成像测量中的不变性,作为忽略问题物理的天真DA策略。通过对多个数据集的广泛研究,我们证明了在低数据区DA可以防止过度拟合,并且可以匹配甚至超越现有技术,同时使用更少的训练数据,而在高数据区DA的回报是递减的。此外,我们的研究结果表明,DA可以提高模型对测试分布的各种变化的鲁棒性。 摘要:Deep neural networks have emerged as very successful tools for image restoration and reconstruction tasks. These networks are often trained end-to-end to directly reconstruct an image from a noisy or corrupted measurement of that image. To achieve state-of-the-art performance, training on large and diverse sets of images is considered critical. However, it is often difficult and/or expensive to collect large amounts of training images. Inspired by the success of Data Augmentation (DA) for classification problems, in this paper, we propose a pipeline for data augmentation for accelerated MRI reconstruction and study its effectiveness at reducing the required training data in a variety of settings. Our DA pipeline, MRAugment, is specifically designed to utilize the invariances present in medical imaging measurements as naive DA strategies that neglect the physics of the problem fail. Through extensive studies on multiple datasets we demonstrate that in the low-data regime DA prevents overfitting and can match or even surpass the state of the art while using significantly fewer training data, whereas in the high-data regime it has diminishing returns. Furthermore, our findings show that DA can improve the robustness of the model against various shifts in the test distribution.
蒸馏|知识提取(1篇)
【1】 ScanBank: A Benchmark Dataset for Figure Extraction from Scanned Electronic Theses and Dissertations 标题:ScanBank:一种用于扫描电子论文图形提取的基准数据集
作者:Sampanna Yashwant Kahu,William A. Ingram,Edward A. Fox,Jian Wu 机构:Department of Electrical and Computer Engineering, Virginia Polytechnic Institute and State University, Blacksburg, VA , Department of Computer Science, Old Dominion University, Norfolk, VA 备注:16 pages, 3 figures, submitted to ACM/IEEE Joint Conference on Digital Libraries 链接:https://arxiv.org/abs/2106.15320 摘要:我们专注于电子论文和学位论文(etd),旨在提高访问和扩大其效用,因为超过600万是公开的,它们构成了一个重要的语料库,以帮助跨学科的研究和教育。随着新诞生的数字文档的加入,语料库也在不断增长,自从数百万篇老论文和学位论文被转换成数字形式,以电子方式在机构知识库中传播以来。在ETDs中,与其他学术著作一样,图表能以简洁的方式传达大量的信息。虽然已经有人提出了从出生数字PDF中提取图形和表格的方法,但它们不能很好地用于扫描的ETD。考虑到这个问题,我们对最先进的图形提取系统的评估是,它们在扫描的PDF上不能很好地工作的原因是,它们只接受过数字文档方面的训练。为了解决这一限制,我们提出了ScanBank,一个新的数据集,包含10000个扫描页面图像,由人类手动标记其中的33000个图形或表格。利用该数据集训练基于YOLOv5的深层神经网络模型,从扫描的etd中准确提取图形和表格。我们提出并回答了一些重要的研究问题,旨在找到更好的方法从扫描文件中提取图形。其中一个涉及到数据增强技术的训练价值,这些技术应用于天生的数字文档,用于训练更适合于从扫描文档中提取图形的模型。据我们所知,ScanBank是第一个用于扫描ETD图形和表格提取的手动注释数据集。一个基于YOLOv5的模型,在ScanBank上训练,比现有的可比较的开源和免费的基线方法有相当大的优势。 摘要:We focus on electronic theses and dissertations (ETDs), aiming to improve access and expand their utility, since more than 6 million are publicly available, and they constitute an important corpus to aid research and education across disciplines. The corpus is growing as new born-digital documents are included, and since millions of older theses and dissertations have been converted to digital form to be disseminated electronically in institutional repositories. In ETDs, as with other scholarly works, figures and tables can communicate a large amount of information in a concise way. Although methods have been proposed for extracting figures and tables from born-digital PDFs, they do not work well with scanned ETDs. Considering this problem, our assessment of state-of-the-art figure extraction systems is that the reason they do not function well on scanned PDFs is that they have only been trained on born-digital documents. To address this limitation, we present ScanBank, a new dataset containing 10 thousand scanned page images, manually labeled by humans as to the presence of the 3.3 thousand figures or tables found therein. We use this dataset to train a deep neural network model based on YOLOv5 to accurately extract figures and tables from scanned ETDs. We pose and answer important research questions aimed at finding better methods for figure extraction from scanned documents. One of those concerns the value for training, of data augmentation techniques applied to born-digital documents which are used to train models better suited for figure extraction from scanned documents. To the best of our knowledge, ScanBank is the first manually annotated dataset for figure and table extraction for scanned ETDs. A YOLOv5-based model, trained on ScanBank, outperforms existing comparable open-source and freely available baseline methods by a considerable margin.
推荐(1篇)
【1】 On component interactions in two-stage recommender systems 标题:两阶段推荐系统中的组件交互研究
作者:Jiri Hron,Karl Krauth,Michael I. Jordan,Niki Kilbertus 机构:University of Cambridge, UC Berkeley, Helmholtz AI, Munich 链接:https://arxiv.org/abs/2106.14979 摘要:由于其可扩展性,两阶段推荐被当今许多最大的在线平台使用,包括YouTube、LinkedIn和Pinterest。这些系统通过两个步骤产生建议:(i)多个提名者——调整为低预测延迟——从整个项目库中预选一小部分候选人(ii)~更慢但更准确的分级进一步缩小了指定项目的范围,并为用户服务。尽管两阶段推荐算法很受欢迎,但是关于两阶段推荐算法的文献相对较少,而且算法通常被视为各部分的总和。这种处理假定两个阶段的性能是由单独部署的组件的行为来解释的。事实并非如此:利用合成和真实世界的数据,我们证明了ranker和提名者之间的相互作用实质上影响了整体绩效。基于这些发现,我们导出了一个推广下界,它表明仔细选择每个提名者的训练集有时是一个差的和一个最优的两阶段推荐者之间的唯一区别。由于手动搜索一个好的选择是困难的,我们学习一个代替。特别是,使用混合专家方法,我们训练提名者(专家)专门处理项目库的不同子集。这将显著提高性能。 摘要:Thanks to their scalability, two-stage recommenders are used by many of today's largest online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce recommendations in two steps: (i) multiple nominators -- tuned for low prediction latency -- preselect a small subset of candidates from the whole item pool; (ii)~a slower but more accurate ranker further narrows down the nominated items, and serves to the user. Despite their popularity, the literature on two-stage recommenders is relatively scarce, and the algorithms are often treated as the sum of their parts. Such treatment presupposes that the two-stage performance is explained by the behavior of individual components if they were deployed independently. This is not the case: using synthetic and real-world data, we demonstrate that interactions between the ranker and the nominators substantially affect the overall performance. Motivated by these findings, we derive a generalization lower bound which shows that careful choice of each nominator's training set is sometimes the only difference between a poor and an optimal two-stage recommender. Since searching for a good choice manually is difficult, we learn one instead. In particular, using a Mixture-of-Experts approach, we train the nominators (experts) to specialize on different subsets of the item pool. This significantly improves performance.
自动驾驶|车辆|车道检测等(1篇)
【1】 Autonomous Driving Implementation in an Experimental Environment 标题:自主驾驶在实验环境中的实现
作者:Namig Aliyev,Oguzhan Sezer,Mehmet Turan Guzel 机构:ID ,‡, Department of Computer Engineering, Sakarya University 备注:8 pages, 21 figures.This is a bachelor's thesis research report and was supported by the Scientific and Technological Research Council of Turkey 链接:https://arxiv.org/abs/2106.15274 摘要:自主系统需要识别环境,而要将其安全地付诸实践还有很长的路要走。在自动驾驶系统中,障碍物和红绿灯的检测与车道跟踪同样重要。在本研究中,我们开发了一套自主驾驶系统,并在设计的实验环境中进行了测试。在该系统中,采用带摄像头的模型车进行车道跟踪和避障实验,研究自主驾驶行为。训练卷积神经网络模型进行车道跟踪。针对车辆避障,分别建立了拐角检测、光流、焦点扩展、碰撞时间、平衡计算和决策机制。 摘要:Autonomous systems require identifying the environment and it has a long way to go before putting it safely into practice. In autonomous driving systems, the detection of obstacles and traffic lights are of importance as well as lane tracking. In this study, an autonomous driving system is developed and tested in the experimental environment designed for this purpose. In this system, a model vehicle having a camera is used to trace the lanes and avoid obstacles to experimentally study autonomous driving behavior. Convolutional Neural Network models were trained for Lane tracking. For the vehicle to avoid obstacles, corner detection, optical flow, focus of expansion, time to collision, balance calculation, and decision mechanism were created, respectively.
点云|SLAM|雷达|激光|深度RGBD相关(2篇)
【1】 Predicting Depth from Semantic Segmentation using Game Engine Dataset 标题:基于游戏引擎数据集的语义分割深度预测
作者:Mohammad Amin Kashi 机构:Supervisor, Hamid D. Taghirad, Summer , arXiv:,.,v, [cs.CV] , Jun 备注:79 pages, Master's thesis at K. N. Toosi University of Technology, supervised by Professor Hamid D. Taghirad 链接:https://arxiv.org/abs/2106.15257 摘要:深度感知是机器人理解周围环境的基础。根据认知神经科学的观点,视觉深度知觉方法可分为三类,即双目视觉、主动视觉和图像视觉。前两类已经详细研究了几十年。然而,近年来随着深度学习方法的出现,对第三类知识的探索研究仍处于起步阶段,并取得了一定的发展势头。在认知神经科学中,图像深度知觉机制依赖于所见物体的知觉。受此启发,本文研究了卷积神经网络中物体感知与深度估计的关系。为此,我们开发了一种新的网络结构,它基于一个简单的深度估计网络,只使用一幅图像作为输入。我们提出的结构使用图像和图像的语义标签作为输入。我们使用语义标签作为对象感知的输出。与原网络的性能比较结果表明,新结构能使深度估计性能提高52%。大多数实验研究都是在游戏引擎生成的合成数据集上进行的,目的是将性能比较与非合成数据集不准确的深度和语义标签的影响隔离开来。结果表明,在没有合适的数据集的情况下,特定的合成数据集可用于深度网络的训练。此外,我们还发现,在这些情况下,语义标签的使用提高了网络对域从合成训练数据向非合成测试数据转移的鲁棒性。 摘要:Depth perception is fundamental for robots to understand the surrounding environment. As the view of cognitive neuroscience, visual depth perception methods are divided into three categories, namely binocular, active, and pictorial. The first two categories have been studied for decades in detail. However, research for the exploration of the third category is still in its infancy and has got momentum by the advent of deep learning methods in recent years. In cognitive neuroscience, it is known that pictorial depth perception mechanisms are dependent on the perception of seen objects. Inspired by this fact, in this thesis, we investigated the relation of perception of objects and depth estimation convolutional neural networks. For this purpose, we developed new network structures based on a simple depth estimation network that only used a single image at its input. Our proposed structures use both an image and a semantic label of the image as their input. We used semantic labels as the output of object perception. The obtained results of performance comparison between the developed network and original network showed that our novel structures can improve the performance of depth estimation by 52\% of relative error of distance in the examined cases. Most of the experimental studies were carried out on synthetic datasets that were generated by game engines to isolate the performance comparison from the effect of inaccurate depth and semantic labels of non-synthetic datasets. It is shown that particular synthetic datasets may be used for training of depth networks in cases that an appropriate dataset is not available. Furthermore, we showed that in these cases, usage of semantic labels improves the robustness of the network against domain shift from synthetic training data to non-synthetic test data.
【2】 Limited depth bandit-based strategy for Monte Carlo planning in continuous action spaces 标题:基于有限深度的连续动作空间Monte Carlo规划策略
作者:Ricardo Quinteiro,Francisco S. Melo,Pedro A. Santos 机构: Instituto Superior T´ecnico, University of Lisbon, Portugal, INESC-ID, Lisbon, Portugal 链接:https://arxiv.org/abs/2106.15594 摘要:本文讨论了用搜索树进行最优控制的问题。我们从考虑具有连续作用空间的多武装土匪问题入手,提出了一种有限深度的层次优化算法LD-HOO。我们为LD-HOO提供了一个遗憾分析,并证明,渐进地,我们的算法表现出与原始HOO相同的累积遗憾,同时速度更快,内存效率更高。在此基础上,提出了一种基于LD-HOO的蒙特卡罗树搜索算法,并举例说明了该算法在多个最优控制问题中的应用。 摘要:This paper addresses the problem of optimal control using search trees. We start by considering multi-armed bandit problems with continuous action spaces and propose LD-HOO, a limited depth variant of the hierarchical optimistic optimization (HOO) algorithm. We provide a regret analysis for LD-HOO and show that, asymptotically, our algorithm exhibits the same cumulative regret as the original HOO while being faster and more memory efficient. We then propose a Monte Carlo tree search algorithm based on LD-HOO for optimal control problems and illustrate the resulting approach's application in several optimal control problems.
联邦学习|隐私保护|加密(4篇)
【1】 Personalized Federated Learning with Gaussian Processes 标题:基于高斯过程的个性化联合学习
作者:Idan Achituve,Aviv Shamsian,Aviv Navon,Gal Chechik,Ethan Fetaya 机构:Bar-Ilan University, Israel, NVIDIA, Isreal 链接:https://arxiv.org/abs/2106.15482 摘要:联邦学习旨在学习一个全局模型,该模型在跨客户机通信有限的客户机设备上运行良好。个性化联合学习(PFL)通过学习个性化模型进一步扩展了这种设置,以处理客户机之间的数据异构性。此设置中的一个关键挑战是跨客户机有效地学习,即使每个客户机都有唯一的数据,而这些数据的大小通常是有限的。本文提出了一种基于深度核学习的高斯过程(GPs)的pFedGP算法。GPs是一种高度表达的模型,由于其贝叶斯性质,在低数据区工作良好。然而,将GPs应用于PFL带来了多重挑战。主要地,GPs性能在很大程度上取决于对一个好的内核函数的访问,而学习一个内核需要大量的训练集。因此,我们建议学习所有客户机的共享核函数,通过神经网络参数化,每个客户机使用一个个人GP分类器。我们进一步扩展了pFedGP,使用两种新的方法来包含诱导点,第一种方法有助于提高低数据区域的泛化能力,第二种方法减少了计算量。我们推导了一个新客户机上的PAC-Bayes推广界,并证明了它给出了非空保证。在使用CIFAR-10、CIFAR-100和CINIC-10的标准PFL基准上以及在输入噪声下学习的新设置上进行的大量实验表明,pFedGP实现了良好的校准预测,同时显著优于基线方法,精度增益达到21%。 摘要:Federated learning aims to learn a global model that performs well on client devices with limited cross-client communication. Personalized federated learning (PFL) further extends this setup to handle data heterogeneity between clients by learning personalized models. A key challenge in this setting is to learn effectively across clients even though each client has unique data that is often limited in size. Here we present pFedGP, a solution to PFL that is based on Gaussian processes (GPs) with deep kernel learning. GPs are highly expressive models that work well in the low data regime due to their Bayesian nature. However, applying GPs to PFL raises multiple challenges. Mainly, GPs performance depends heavily on access to a good kernel function, and learning a kernel requires a large training set. Therefore, we propose learning a shared kernel function across all clients, parameterized by a neural network, with a personal GP classifier for each client. We further extend pFedGP to include inducing points using two novel methods, the first helps to improve generalization in the low data regime and the second reduces the computational cost. We derive a PAC-Bayes generalization bound on novel clients and empirically show that it gives non-vacuous guarantees. Extensive experiments on standard PFL benchmarks with CIFAR-10, CIFAR-100, and CINIC-10, and on a new setup of learning under input noise show that pFedGP achieves well-calibrated predictions while significantly outperforming baseline methods, reaching up to 21% in accuracy gain.
【2】 A Comprehensive Survey of Incentive Mechanism for Federated Learning 标题:联合学习激励机制研究综述
作者:Rongfei Zeng,Chao Zeng,Xingwei Wang,Bo Li,Xiaowen Chu 机构:∗ Software College, Northeastern University, Shenyang, China, † Department of Computer Science, Northeastern University, Shenyang, China, ‡ Department of Computer Science and Engineering, HKUST, Hong Kong, China 备注:more than 10 pages 链接:https://arxiv.org/abs/2106.15406 摘要:联合学习利用参与者提供的各种资源来协作训练一个全局模型,这可能解决机器学习的数据隐私问题。在这样一个有前途的范例中,学习过程中如果没有足够的训练数据和其他资源,学习效果将会恶化。因此,激励更多的参与者贡献他们宝贵的资源并为联合学习支付一定的费用是非常重要的。在本文中,我们提出了一个全面的调查激励计划的联邦成员学习。具体地说,我们确定了联合学习中的激励问题,然后为各种方案提供了分类。随后,从Stackelberg博弈、拍卖、契约理论、Shapley值、强化学习、区块链等主要技术方面对现有的激励机制进行了总结。通过回顾和比较一些令人印象深刻的结果,我们指出了未来研究的三个方向。 摘要:Federated learning utilizes various resources provided by participants to collaboratively train a global model, which potentially address the data privacy issue of machine learning. In such promising paradigm, the performance will be deteriorated without sufficient training data and other resources in the learning process. Thus, it is quite crucial to inspire more participants to contribute their valuable resources with some payments for federated learning. In this paper, we present a comprehensive survey of incentive schemes for federate learning. Specifically, we identify the incentive problem in federated learning and then provide a taxonomy for various schemes. Subsequently, we summarize the existing incentive mechanisms in terms of the main techniques, such as Stackelberg game, auction, contract theory, Shapley value, reinforcement learning, blockchain. By reviewing and comparing some impressive results, we figure out three directions for the future study.
【3】 Federated Learning for Intrusion Detection in IoT Security: A Hybrid Ensemble Approach 标题:物联网安全中入侵检测的联合学习:一种混合集成方法
作者:Sayan Chatterjee,Manjesh K. Hanawal 机构:IEOR, IIT Bombay, Mumbai, India 备注:11 pages, 3 figures 链接:https://arxiv.org/abs/2106.15349 摘要:物联网在智能城市、医疗保健、供应链、交通运输等领域发挥着重要作用,成为恶意攻击的目标。过去在这方面的工作主要集中在集中式入侵检测系统(IDS),假设存在一个中心实体来执行数据分析和识别威胁。然而,这样的IDS可能并不总是可行的,主要是由于数据在多个源之间的传播和在中心节点的收集可能是昂贵的。另外,早期的工作主要集中在提高真阳性率(TPR),而忽略了假阳性率(FPR),这也是避免系统不必要停机的关键。在本文中,我们首先提出了一种基于混合集成模型的入侵检测系统体系结构,称为PHEC,它比现有的体系结构具有更好的性能。然后,我们将此模型调整为执行局部训练并仅聚合模型参数的联合学习框架。接下来,我们提出了在集中式和联邦式环境下的抗噪声PHEC来解决标签噪声问题。该方法使用加权凸代理损失函数作为分类器。该结构还利用了KNN分类器对噪声数据的自然鲁棒性。在四个测试数据集上的实验结果表明,该模型在保持低噪声和干净数据的FPR的同时,实现了较高的TPR。此外,他们还证明了混合集成模型在联邦环境中的性能接近于集中式环境。 摘要:Critical role of Internet of Things (IoT) in various domains like smart city, healthcare, supply chain and transportation has made them the target of malicious attacks. Past works in this area focused on centralized Intrusion Detection System (IDS), assuming the existence of a central entity to perform data analysis and identify threats. However, such IDS may not always be feasible, mainly due to spread of data across multiple sources and gathering at central node can be costly. Also, the earlier works primarily focused on improving True Positive Rate (TPR) and ignored the False Positive Rate (FPR), which is also essential to avoid unnecessary downtime of the systems. In this paper, we first present an architecture for IDS based on hybrid ensemble model, named PHEC, which gives improved performance compared to state-of-the-art architectures. We then adapt this model to a federated learning framework that performs local training and aggregates only the model parameters. Next, we propose Noise-Tolerant PHEC in centralized and federated settings to address the label-noise problem. The proposed idea uses classifiers using weighted convex surrogate loss functions. Natural robustness of KNN classifier towards noisy data is also used in the proposed architecture. Experimental results on four benchmark datasets drawn from various security attacks show that our model achieves high TPR while keeping FPR low on noisy and clean data. Further, they also demonstrate that the hybrid ensemble models achieve performance in federated settings close to that of the centralized settings.
【4】 Achieving Statistical Optimality of Federated Learning: Beyond Stationary Points 标题:实现联合学习的统计最优性:超越驻点
作者:Lili Su,Jiaming Xu,Pengkun Yang 机构:Electrical and Computer Engineering, Northeastern University, The Fuqua School of Business, Duke University, Center for Statistical Science, Tsinghua University 链接:https://arxiv.org/abs/2106.15216 摘要:联邦学习(FL)是一个很有前途的框架,在保护隐私和降低云计算负载方面有很大的潜力。FedAvg和FedProx是两种被广泛采用的算法。然而,最近的工作对这两种方法提出了关注:(1)它们的不动点与原优化问题的驻点不一致;(2)所发现的一般模型可能不能很好地局部推广。在本文中,我们缓解了这些担忧。为此,我们采用统计学习的观点,但允许分布的异质性和本地数据的不平衡。我们证明,在一般的核回归设置下,FedAvg和FedProx都收敛到极小极大最优错误率。此外,当核函数的秩有限时,收敛速度是指数级的。我们的结果进一步分析量化了模型异质性的影响,并描述了联合增益的特征——与最佳局部估计相比,工人加入联合学习的估计误差减小。据我们所知,我们是第一个证明在FedAvg和FedProx下极小极大错误率的可实现性的人,也是第一个描述加入FL的增益的人。数值实验进一步证实了我们关于FedAvg和FedProx的统计最优性和联邦增益的理论发现。 摘要:Federated Learning (FL) is a promising framework that has great potentials in privacy preservation and in lowering the computation load at the cloud. FedAvg and FedProx are two widely adopted algorithms. However, recent work raised concerns on these two methods: (1) their fixed points do not correspond to the stationary points of the original optimization problem, and (2) the common model found might not generalize well locally. In this paper, we alleviate these concerns. Towards this, we adopt the statistical learning perspective yet allow the distributions to be heterogeneous and the local data to be unbalanced. We show, in the general kernel regression setting, that both FedAvg and FedProx converge to the minimax-optimal error rates. Moreover, when the kernel function has a finite rank, the convergence is exponentially fast. Our results further analytically quantify the impact of the model heterogeneity and characterize the federation gain - the reduction of the estimation error for a worker to join the federated learning compared to the best local estimator. To the best of our knowledge, we are the first to show the achievability of minimax error rates under FedAvg and FedProx, and the first to characterize the gains in joining FL. Numerical experiments further corroborate our theoretical findings on the statistical optimality of FedAvg and FedProx and the federation gains.
推理|分析|理解|解释(5篇)
【1】 Near-Optimal Explainable k-Means for All Dimensions标题:接近最佳的可解释k-所有维度的均值
作者:Moses Charikar,Lunjia Hu 机构: Stanford University 备注:31 pages 链接:https://arxiv.org/abs/2106.15566 摘要:许多聚类算法都是以一定的代价函数为指导的,例如广泛使用的$k$-均值代价函数。这些算法将数据点划分成具有复杂边界的聚类,给解释聚类决策带来困难。在最近的一项工作中,Dasgupta、Frost、Moshkovitz和Rashtchian(ICML'20)引入了可解释聚类,其中聚类边界是轴平行超平面,聚类是通过对数据应用决策树获得的。这里的核心问题是:可解释性约束在多大程度上增加了成本函数的值?在给定$d$维数据点的情况下,我们给出了一个有效的算法,在假设$k,d\ge2$的情况下,该算法可以找到一个可解释的聚类,其$k$意味着成本最多为$k^{1-2/d}\mathrm{poly}(d\logk)$倍于无可解释性约束的聚类所能达到的最小成本。结合Makarychev和Shan(ICML'21)的一个独立工作,我们得到了$k^{1-2/d}\mathrm{polylog}(k)$的一个改进的界,我们表明它对于$k,d\ge2$的每一个选择都是最优的,直到$k$中的一个多对数因子。特别是对于$d=2$,我们给出了一个$O(\log k\log\log k)$界,它比以前的最佳界$\widetilde O(k)$有指数级的提高。 摘要:Many clustering algorithms are guided by certain cost functions such as the widely-used $k$-means cost. These algorithms divide data points into clusters with often complicated boundaries, creating difficulties in explaining the clustering decision. In a recent work, Dasgupta, Frost, Moshkovitz, and Rashtchian (ICML'20) introduced explainable clustering, where the cluster boundaries are axis-parallel hyperplanes and the clustering is obtained by applying a decision tree to the data. The central question here is: how much does the explainability constraint increase the value of the cost function? Given $d$-dimensional data points, we show an efficient algorithm that finds an explainable clustering whose $k$-means cost is at most $k^{1 - 2/d}\mathrm{poly}(d\log k)$ times the minimum cost achievable by a clustering without the explainability constraint, assuming $k,d\ge 2$. Combining this with an independent work by Makarychev and Shan (ICML'21), we get an improved bound of $k^{1 - 2/d}\mathrm{polylog}(k)$, which we show is optimal for every choice of $k,d\ge 2$ up to a poly-logarithmic factor in $k$. For $d = 2$ in particular, we show an $O(\log k\log\log k)$ bound, improving exponentially over the previous best bound of $\widetilde O(k)$.
【2】 Interactive Dimensionality Reduction for Comparative Analysis 标题:用于比较分析的交互式降维方法
作者:Takanori Fujiwara,Xinhai Wei,Jian Zhao,Kwan-Liu Ma 机构: This functionality is important to interactively adjust anembedding result to intuitively find certain patterns of analysts’ target• Takanori Fujiwara and Kwan-Liu Ma are with University of California 备注:This manuscript is currently under review 链接:https://arxiv.org/abs/2106.15481 摘要:找出两组或两组以上数据集之间的异同是一项基本的分析任务。对于高维数据,通常采用降维(DR)方法来寻找各组的特征。然而,现有的灾难恢复方法对这种比较分析提供了有限的能力和灵活性,因为每种方法都是为一个狭窄的分析目标而设计的,例如识别最能区分群体的因素。在这项工作中,我们介绍了一个交互式DR框架,其中我们将我们的新DR方法,称为ULCA(统一线性比较分析),与一个交互式的可视化界面相结合。ULCA统一了两种DR方案:判别分析和对比学习,以支持各种比较分析任务。为了提供比较分析的灵活性,我们开发了一种优化算法,使分析人员能够交互式地细化ULCA结果。此外,我们还提供了一个交互式可视化界面,通过一组丰富的分析库来检查ULCA结果。我们评估了ULCA和优化算法,以显示它们的效率,并使用实际数据集进行了多个案例研究,以证明我们的框架的有效性。 摘要:Finding the similarities and differences between two or more groups of datasets is a fundamental analysis task. For high-dimensional data, dimensionality reduction (DR) methods are often used to find the characteristics of each group. However, existing DR methods provide limited capability and flexibility for such comparative analysis as each method is designed only for a narrow analysis target, such as identifying factors that most differentiate groups. In this work, we introduce an interactive DR framework where we integrate our new DR method, called ULCA (unified linear comparative analysis), with an interactive visual interface. ULCA unifies two DR schemes, discriminant analysis and contrastive learning, to support various comparative analysis tasks. To provide flexibility for comparative analysis, we develop an optimization algorithm that enables analysts to interactively refine ULCA results. Additionally, we provide an interactive visualization interface to examine ULCA results with a rich set of analysis libraries. We evaluate ULCA and the optimization algorithm to show their efficiency as well as present multiple case studies using real-world datasets to demonstrate the usefulness of our framework.
【3】 Counterfactual Explanations for Arbitrary Regression Models 标题:任意回归模型的反事实解释
作者:Thomas Spooner,Danial Dervovic,Jason Long,Jon Shepard,Jiahao Chen,Daniele Magazzeni 机构:J. P. Morgan AI Research 备注:20 pages, 5 figures, 3 tables 链接:https://arxiv.org/abs/2106.15212 摘要:提出了一种新的基于贝叶斯优化的反事实解释方法,该方法适用于分类模型和回归模型。该方法是一种全局收敛的搜索算法,支持任意回归模型和特征稀疏性、可操作追索等约束条件,并能在学习已有查询的同时并行地回答多个反事实问题。我们在一个严格的数学框架下,利用可微势来建立回归模型的CFE搜索,解决了基于阈值的目标的鲁棒性问题。我们证明了在这个框架下,(a)证明反事实的存在是NP完全的;并且(b)利用这种势寻找实例是完全的。我们描述了一个统一的算法为CFEs使用一个专门的收购函数,其中既包括预期的改善和指数多项式(EP)家庭与理想的性质。我们对真实世界基准域的评估显示了高的样本效率和精度。 摘要:We present a new method for counterfactual explanations (CFEs) based on Bayesian optimisation that applies to both classification and regression models. Our method is a globally convergent search algorithm with support for arbitrary regression models and constraints like feature sparsity and actionable recourse, and furthermore can answer multiple counterfactual questions in parallel while learning from previous queries. We formulate CFE search for regression models in a rigorous mathematical framework using differentiable potentials, which resolves robustness issues in threshold-based objectives. We prove that in this framework, (a) verifying the existence of counterfactuals is NP-complete; and (b) that finding instances using such potentials is CLS-complete. We describe a unified algorithm for CFEs using a specialised acquisition function that composes both expected improvement and an exponential-polynomial (EP) family with desirable properties. Our evaluation on real-world benchmark domains demonstrate high sample-efficiency and precision.
【4】 Towards Understanding the Effectiveness of Attention Mechanism 标题:走向理解注意机制的有效性
作者:Xiang Ye,Zihang He,Heng Wang,Yong Li 机构:Xiang Ye is with School of Electronic Engineering, Beijing University ofPosts and Telecommunications Beijing, Yong Li is with School of Electronic Engineering 链接:https://arxiv.org/abs/2106.15067 摘要:注意机制是提高卷积神经网络在计算机视觉任务中性能的一种广泛应用的方法。尽管其普遍存在,但我们对其效力的来源缺乏了解。人们普遍认为,它的有效性来源于视觉注意解释,主张将注意力集中在输入数据的重要部分,而不是将输入数据全部摄取。在本文中,我们发现特征的注意权重与其重要性之间只有微弱的一致性。相反,我们验证了特征映射倍增在注意机制中的关键作用,揭示了特征映射倍增对CNN学习景观的根本影响:由于特征映射倍增带来的高阶非线性,它对CNN起到了调节作用,这使得他们学习到更平滑和更稳定的景观接近真实样品相比,香草CNN。这种平滑性和稳定性使得CNN在真实样本之间的行为更具预测性和稳定性,从而使CNN的生成效果更好。此外,基于特征映射乘法的有效性,我们设计了特征映射乘法网络(FMMNet),简单地用特征映射乘法代替ResNet中的特征映射加法。FMMNet在各种数据集上的性能都优于ResNet,这表明即使在现有方法中没有精心设计的注意机制的情况下,特征映射乘法对提高性能也起着至关重要的作用。 摘要:Attention Mechanism is a widely used method for improving the performance of convolutional neural networks (CNNs) on computer vision tasks. Despite its pervasiveness, we have a poor understanding of what its effectiveness stems from. It is popularly believed that its effectiveness stems from the visual attention explanation, advocating focusing on the important part of input data rather than ingesting the entire input. In this paper, we find that there is only a weak consistency between the attention weights of features and their importance. Instead, we verify the crucial role of feature map multiplication in attention mechanism and uncover a fundamental impact of feature map multiplication on the learned landscapes of CNNs: with the high order non-linearity brought by the feature map multiplication, it played a regularization role on CNNs, which made them learn smoother and more stable landscapes near real samples compared to vanilla CNNs. This smoothness and stability induce a more predictive and stable behavior in-between real samples, and make CNNs generate better. Moreover, motivated by the proposed effectiveness of feature map multiplication, we design feature map multiplication network (FMMNet) by simply replacing the feature map addition in ResNet with feature map multiplication. FMMNet outperforms ResNet on various datasets, and this indicates that feature map multiplication plays a vital role in improving the performance even without finely designed attention mechanism in existing methods.
【5】 US Fatal Police Shooting Analysis and Prediction 标题:美国警察致命枪击事件分析与预测
作者:Yuan Wang,Yangxin Fan 机构:University of Rochester 链接:https://arxiv.org/abs/2106.15298 摘要:我们相信“人人生而平等”。随着媒体报道的警察枪击案的增多,越来越多的美国人认为警察在执法过程中过度使用武力,特别是对特定人群。我们希望运用多维统计分析来揭示比单调的主流媒体更多的事实。我们的论文分为三个部分。首先,提出了一种量化主流媒体(CNN、FOX、ABC、NBC)致命枪击新闻报道偏差的新方法。其次,我们分析了《华盛顿邮报》最全面的美国致命警察枪击案数据集。我们使用FP-growth来揭示频繁模式,并使用DBSCAN聚类来发现致命的射击热点。我们带来了多个属性(社会经济、人口统计、政治倾向、教育程度、持枪率、警察训练时间等)来揭示冰山下的联系。我们发现一个州的警察射击率取决于许多变量。前四个最相关的属性是州加入年份、州土地面积、枪支拥有率和暴力犯罪率。第三,我们提出了四个回归模型来预测州一级的警察射击率。最佳模型Kstar可预测警察的致命射击率,相关系数约为88.53%。提出了梯度增强机、多类分类器、Logistic回归和朴素贝叶斯分类器等分类模型,用于预测警察枪击致死受害者的种族。我们的分类模型没有明显证据表明种族歧视发生在WP数据集记录的致命警察枪击事件中。 摘要:We believe that "all men are created equal". With the rise of the police shootings reported by media, more people in the U.S. think that police use excessive force during law enforcement, especially to a specific group of people. We want to apply multidimensional statistical analysis to reveal more facts than the monotone mainstream media. Our paper has three parts. First, we proposed a new method to quantify fatal police shooting news reporting deviation of mainstream media, which includes CNN, FOX, ABC, and NBC. Second, we analyzed the most comprehensive US fatal police shooting dataset from Washington Post. We used FP-growth to reveal the frequent patterns and DBSCAN clustering to find fatal shooting hotspots. We brought multi-attributes (social economics, demographics, political tendency, education, gun ownership rate, police training hours, etc.) to reveal connections under the iceberg. We found that the police shooting rate of a state depends on many variables. The top four most relevant attributes were state joined year, state land area, gun ownership rate, and violent crime rate. Third, we proposed four regression models to predict police shooting rates at the state level. The best model Kstar could predict the fatal police shooting rate with about 88.53% correlation coefficient. We also proposed classification models, including Gradient Boosting Machine, Multi-class Classifier, Logistic Regression, and Naive Bayes Classifier, to predict the race of fatal police shooting victims. Our classification models show no significant evidence to conclude that racial discrimination happened during fatal police shootings recorded by the WP dataset.
检测相关(8篇)
【1】 Detecting Cattle and Elk in the Wild from Space 标题:从太空探测野外的牛和麋鹿
作者:Caleb Robinson,Anthony Ortiz,Lacey Hughey,Jared A. Stabach,Juan M. Lavista Ferres 机构: Smithsonian Conservation Biology Institute 备注:Presented at the KDD 2021 Fragile Earth Workshop 链接:https://arxiv.org/abs/2106.15448 摘要:在非常高分辨率的卫星图像中定位和计算大型有蹄类动物,如牛和麋鹿,是支持生态学研究的一项重要任务。以前的工作表明,这是可行的,与深入学习为基础的方法和亚米多光谱卫星图像。我们通过提出一种基线方法CowNet来扩展这项工作,CowNet可以同时估计图像中动物的数量(计数),并在像素级预测它们的位置(局部化)。我们还提出了一种方法来评估这种模型在大型场景中的计数和定位任务,考虑到噪声标签的不确定性和利益相关者在生态监测任务中需要的信息。最后,我们用最先进的视觉方法来测试我们的基线方法,以计算场景中的物体。我们在加利福尼亚州雷耶斯角海岸的一个大型景观上具体测试了结果模型的时间泛化。我们发现,LC-FCN模型的性能最好,在三个测试场景中的平均精度在0.56到0.61之间,平均召回率在0.78到0.92之间。 摘要:Localizing and counting large ungulates -- hoofed mammals like cows and elk -- in very high-resolution satellite imagery is an important task for supporting ecological studies. Prior work has shown that this is feasible with deep learning based methods and sub-meter multi-spectral satellite imagery. We extend this line of work by proposing a baseline method, CowNet, that simultaneously estimates the number of animals in an image (counts), as well as predicts their location at a pixel level (localizes). We also propose an methodology for evaluating such models on counting and localization tasks across large scenes that takes the uncertainty of noisy labels and the information needed by stakeholders in ecological monitoring tasks into account. Finally, we benchmark our baseline method with state of the art vision methods for counting objects in scenes. We specifically test the temporal generalization of the resulting models over a large landscape in Point Reyes Seashore, CA. We find that the LC-FCN model performs the best and achieves an average precision between 0.56 and 0.61 and an average recall between 0.78 and 0.92 over three held out test scenes.
【2】 Online Interaction Detection for Click-Through Rate Prediction 标题:用于点击率预测的在线交互检测
作者:Qiuqiang Lin,Chuanhou Gao 机构: Gao are with the School of Mathematical Sciences, ZhejiangUniversity 备注:11pages, 4 figures, 1 supplement 链接:https://arxiv.org/abs/2106.15400 摘要:点击率预测旨在预测特定链接的点击率和印象率。这是一项具有挑战性的任务,因为(1)通常有分类特征,如果采用一种热编码,输入将是非常高维的,(2)不仅原始特征而且它们的相互作用都很重要,(3)有效的预测可能依赖于不同时间段的不同特征和相互作用。为了克服这些困难,我们提出了一种新的交互检测方法,即在线随机交叉链。该方法基于频繁项集挖掘的思想,通过观察随机选取样本的交集来检测信息交互。所发现的交互具有很高的可解释性,因为它们可以理解为逻辑表达式。ORIC可以在每次收集新数据时更新,而无需对历史数据进行重新训练。此外,历史数据和最新数据的重要性可以通过调整参数来控制。设计了一个处理流媒体交互的框架,使得几乎所有现有的CTR预测模型都可以在交互检测后应用。实证结果证明了ORIC在三个基准数据集上的有效性和有效性。 摘要:Click-Through Rate prediction aims to predict the ratio of clicks to impressions of a specific link. This is a challenging task since (1) there are usually categorical features, and the inputs will be extremely high-dimensional if one-hot encoding is applied, (2) not only the original features but also their interactions are important, (3) an effective prediction may rely on different features and interactions in different time periods. To overcome these difficulties, we propose a new interaction detection method, named Online Random Intersection Chains. The method, which is based on the idea of frequent itemset mining, detects informative interactions by observing the intersections of randomly chosen samples. The discovered interactions enjoy high interpretability as they can be comprehended as logical expressions. ORIC can be updated every time new data is collected, without being retrained on historical data. What's more, the importance of the historical and latest data can be controlled by a tuning parameter. A framework is designed to deal with the streaming interactions, so almost all existing models for CTR prediction can be applied after interaction detection. Empirical results demonstrate the efficiency and effectiveness of ORIC on three benchmark datasets.
【3】 Anomaly Detection and Automated Labeling for Voter Registration File Changes 标题:选民登记文件变更的异常检测与自动标注
作者:Sam Royston,Ben Greenberg,Omeed Tavasoli,Courtenay Cotton 机构:VoteShield, Protect Democracy, New York, USA 链接:https://arxiv.org/abs/2106.15285 摘要:美国选举中的选民资格是由包含哪些公民有资格投票的信息的州数据库拼凑而成的。州和地方一级的行政人员面临着一项极其艰巨的任务,即确保他们的每个管辖区都得到适当管理,同时监测对数据库的不当修改。监督选民登记文件(VRF)的变化至关重要,因为恶意行为者希望破坏美国的民主进程,最好操纵这些文件的内容,以实现其目标。2020年,我们看到选举官员在面对美国历史上最具争议的选举时表现出色,但要确保和监督美国人所依赖的选举制度,仍有许多工作要做。通过比较一段时间内vrf的快照所产生的数据,我们提出了一套利用机器学习减轻分析员和管理员在保护选民名单方面的负担的方法。我们首先通过将异常变化建模为稀疏加性噪声,评估了多种无监督异常检测方法在检测VRF修改中的有效性。在这种情况下,我们确定统计模型比较行政区在很短的时间跨度和非负矩阵分解是最有效的表面异常事件审查。这些方法在2019-2020年期间部署在我们组织的监测系统中,并与爱荷华州国务卿办公室合作使用。此外,我们提出了一个新部署的模型,该模型使用历史和人口统计元数据来标记数据库修改的可能根本原因。我们希望利用这个模型来预测哪些修改已知的原因,从而更好地确定潜在的异常修改。 摘要:Voter eligibility in United States elections is determined by a patchwork of state databases containing information about which citizens are eligible to vote. Administrators at the state and local level are faced with the exceedingly difficult task of ensuring that each of their jurisdictions is properly managed, while also monitoring for improper modifications to the database. Monitoring changes to Voter Registration Files (VRFs) is crucial, given that a malicious actor wishing to disrupt the democratic process in the US would be well-advised to manipulate the contents of these files in order to achieve their goals. In 2020, we saw election officials perform admirably when faced with administering one of the most contentious elections in US history, but much work remains to secure and monitor the election systems Americans rely on. Using data created by comparing snapshots taken of VRFs over time, we present a set of methods that make use of machine learning to ease the burden on analysts and administrators in protecting voter rolls. We first evaluate the effectiveness of multiple unsupervised anomaly detection methods in detecting VRF modifications by modeling anomalous changes as sparse additive noise. In this setting we determine that statistical models comparing administrative districts within a short time span and non-negative matrix factorization are most effective for surfacing anomalous events for review. These methods were deployed during 2019-2020 in our organization's monitoring system and were used in collaboration with the office of the Iowa Secretary of State. Additionally, we propose a newly deployed model which uses historical and demographic metadata to label the likely root cause of database modifications. We hope to use this model to predict which modifications have known causes and therefore better identify potentially anomalous modifications.
【4】 On-board Volcanic Eruption Detection through CNNs and Satellite Multispectral Imagery 标题:基于CNNs和卫星多光谱图像的机载火山喷发探测
作者:Maria Pia Del Rosso,Alessandro Sebastianelli,Dario Spiller,Pierre Philippe Mathieu,Silvia Liberata Ullo 备注:This is an ongoing work to be revised and submitted to a journal 链接:https://arxiv.org/abs/2106.15281 摘要:近年来,随着机器学习算法在各种不同应用中的发展,人们对这些算法在实际场景中的适用性进行了大量的研究。其中,由于其物理要求,最困难的场景之一是航空航天场景。在这种情况下,作者的这项工作的目的是提出一个第一原型和可行性研究的人工智能模型是'加载'在船上。作为一个案例研究,作者决定调查火山喷发的探测作为一种快速产生警报的方法。提出并建立了两个卷积神经网络,并说明了如何在实际硬件上正确实现它们,以及如何调整CNN的复杂度以适应计算要求。 摘要:In recent years, the growth of Machine Learning algorithms in a variety of different applications has raised numerous studies on the applicability of these algorithms in real scenarios. Among all, one of the hardest scenarios, due to its physical requirements, is the aerospace one. In this context, the authors of this work aim to propose a first prototype and a study of feasibility for an AI model to be 'loaded' on board. As a case study, the authors decided to investigate the detection of volcanic eruptions as a method to swiftly produce alerts. Two Convolutional Neural Networks have been proposed and created, also showing how to correctly implement them on real hardware and how the complexity of a CNN can be adapted to fit computational requirements.
【5】 Do Not Deceive Your Employer with a Virtual Background: A Video Conferencing Manipulation-Detection System 标题:不要用虚拟背景欺骗你的雇主:视频会议操作检测系统
作者:Mauro Conti,Simone Milani,Ehsan Nowroozi,Gabriele Orazi 机构:† Department of Mathematics, University of Padua, Via Trieste, - Padua, ITALY, ∗Department of Information Engineering, University of Padua, Via Gradenigo ,B - Padua, ITALY 备注:6 pages 链接:https://arxiv.org/abs/2106.15130 摘要:上一代的视频会议软件允许用户利用虚拟背景来隐藏个人环境,尤其是在与其他雇主的正式会议上。另一方面,用户可能希望通过考虑虚拟背景来隐藏他们所在的位置,从而在会议中愚弄人们。在这种情况下,开发工具来理解虚拟背景,并利用这些工具在会议中愚弄人们,起着重要的作用。此外,由于恶意用户可以通过在视频上应用一组对抗性的编辑步骤来隐藏任何暴露的足迹,因此此类检测器必须对不同类型的攻击具有鲁棒性。本文研究了一种有效的视频会议用户背景检测工具的可行性。特别地,我们提供第一个工具来计算像素共现矩阵,并使用它们来搜索光谱和空间波段之间的不一致性。实验证明交叉共现矩阵提高了检测器对各种攻击的鲁棒性。这项工作的表现是特别值得注意的方面,彩色垃圾邮件的特点。此外,与后处理(如几何变换、滤波、对比度增强和具有不同质量因子的JPEG压缩)相比,性能尤其重要。 摘要:The last-generation video conferencing software allows users to utilize a virtual background to conceal their personal environment due to privacy concerns, especially in official meetings with other employers. On the other hand, users maybe want to fool people in the meeting by considering the virtual background to conceal where they are. In this case, developing tools to understand the virtual background utilize for fooling people in meeting plays an important role. Besides, such detectors must prove robust against different kinds of attacks since a malicious user can fool the detector by applying a set of adversarial editing steps on the video to conceal any revealing footprint. In this paper, we study the feasibility of an efficient tool to detect whether a videoconferencing user background is real. In particular, we provide the first tool which computes pixel co-occurrences matrices and uses them to search for inconsistencies among spectral and spatial bands. Our experiments confirm that cross co-occurrences matrices improve the robustness of the detector against different kinds of attacks. This work's performance is especially noteworthy with regard to color SPAM features. Moreover, the performance especially is significant with regard to robustness versus post-processing, like geometric transformations, filtering, contrast enhancement, and JPEG compression with different quality factors.
【6】 FallDeF5: A Fall Detection Framework Using 5G-based Deep Gated Recurrent Unit Networks 标题:FallDeF5:一种基于5G的深门循环单元网络跌倒检测框架
作者:Mabrook S. Al-Rakhami,Abdu Gumaei1,Meteb Altaf,Mohammad Mehedi Hassan,Bader Fahad Alkhamees,Khan Muhammad,Giancarlo Fortino 机构: Research Chair of Pervasive and Mobile Computing; Information Systems Department, College of Computer and Information Sciences, King Saud, Computer Science Department, Taiz University, Taiz , Yemen. 链接:https://arxiv.org/abs/2106.15049 摘要:跌倒在老年人中的患病率很高,由于跌倒的严重后果,这是一个挑战。这就是为什么快速援助是一项关键任务。环境辅助生活(AAL)使用5G网络和医疗物联网(IoMT)等最新技术来解决这一研究领域。边缘计算可以通过将传统的医疗保健服务和应用程序移近最终用户,降低云通信的成本,包括高延迟和带宽使用。人工智能(AI)技术,如深度学习(DL)最近已用于自动跌倒检测,以及支持医疗服务。然而,对于与传统边缘计算环境相连的IoMT,DL需要大量的数据和强大的处理能力来提高其性能。本研究提出了一个有效的基于DL算法和移动边缘计算(MEC)的5G无线网络跌倒检测框架,旨在为基于IoMT的医疗保健应用提供支持。我们还提出使用深选通递归单元(DGRU)神经网络来提高现有基于DL的跌倒检测方法的准确性。DGRU具有处理时间序列IoMT数据的优点,可以减少参数数目,避免梯度消失问题。在两个公共数据集上的实验结果表明,该框架的DGRU模型在相同数据集上取得了比现有相关工作更高的准确率。 摘要:Fall prevalence is high among elderly people, which is challenging due to the severe consequences of falling. This is why rapid assistance is a critical task. Ambient assisted living (AAL) uses recent technologies such as 5G networks and the internet of medical things (IoMT) to address this research area. Edge computing can reduce the cost of cloud communication, including high latency and bandwidth use, by moving conventional healthcare services and applications closer to end-users. Artificial intelligence (AI) techniques such as deep learning (DL) have been used recently for automatic fall detection, as well as supporting healthcare services. However, DL requires a vast amount of data and substantial processing power to improve its performance for the IoMT linked to the traditional edge computing environment. This research proposes an effective fall detection framework based on DL algorithms and mobile edge computing (MEC) within 5G wireless networks, the aim being to empower IoMT-based healthcare applications. We also propose the use of a deep gated recurrent unit (DGRU) neural network to improve the accuracy of existing DL-based fall detection methods. DGRU has the advantage of dealing with time-series IoMT data, and it can reduce the number of parameters and avoid the vanishing gradient problem. The experimental results on two public datasets show that the DGRU model of the proposed framework achieves higher accuracy rates compared to the current related works on the same datasets.
【7】 Feature selection for intrusion detection systems 标题:入侵检测系统的特征选择
作者:Firuz Kamalov,Sherif Moussa,Rita Zgheib,Omar Mashaal 备注:Accepted version of conference paper presented at ISCID 2020 链接:https://arxiv.org/abs/2106.14941 摘要:在本文中,我们分析了现有的特征选择方法来识别网络流量数据中允许入侵检测的关键元素。此外,我们提出了一种新的特征选择方法,解决了考虑连续输入特征和离散目标值的难题。结果表明,与基准选择方法相比,该方法具有良好的性能。我们利用我们的发现开发了一个高效的基于机器学习的检测系统,在区分DDoS和良性信号方面达到了99.9%的准确率。我们相信我们的研究结果对那些对设计和构建自动化入侵检测系统感兴趣的专家是有用的。 摘要:In this paper, we analyze existing feature selection methods to identify the key elements of network traffic data that allow intrusion detection. In addition, we propose a new feature selection method that addresses the challenge of considering continuous input features and discrete target values. We show that the proposed method performs well against the benchmark selection methods. We use our findings to develop a highly effective machine learning-based detection systems that achieves 99.9% accuracy in distinguishing between DDoS and benign signals. We believe that our results can be useful to experts who are interested in designing and building automated intrusion detection systems.
【8】 DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection 标题:DCASE 2021任务3:用于复调声音事件定位和检测的谱时间对齐特征
作者:Thi Ngoc Tho Nguyen,Karn Watcharasupat,Ngoc Khanh Nguyen,Douglas L. Jones,Woon Seng Gan 机构: School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore., Dept. of Electrical and Computer Engineering, University of Illinois at Urbana-Champaign, IL, USA. 备注:5 pages, Technical Report for DCASE 2021 Challenge Task 3 链接:https://arxiv.org/abs/2106.15190 摘要:声事件定位与检测由声事件检测和到达方向估计两个子任务组成。声事件检测主要依靠时频模式来区分不同的声音类别,而到达方向估计则利用麦克风之间的幅度或相位差来估计声源方向。因此,通常很难同时对这两个子任务进行联合训练。我们提出了一种新的特征空间线索增强对数谱图(SALSA),它具有信号功率和源到达方向之间的精确时频映射。该特征包括多信道对数谱图以及估计的直接混响比和谱图上每个时频bin的空间协方差矩阵的主特征向量的归一化版本。在DCASE 2021数据集上进行的具有方向干扰的声音事件定位和检测的实验结果表明,基于这种新特征训练的深度学习模型的性能大大优于DCASE挑战基线。为了进一步提高DCASE声音事件定位和检测挑战的系统性能,我们结合了几个结构稍有不同的模型,对这些模型进行了新特性的训练。 摘要:Sound event localization and detection consists of two subtasks which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses magnitude or phase differences between microphones to estimate source directions. Therefore, it is often difficult to jointly train these two subtasks simultaneously. We propose a novel feature called spatial cue-augmented log-spectrogram (SALSA) with exact time-frequency mapping between the signal power and the source direction-of-arrival. The feature includes multichannel log-spectrograms stacked along with the estimated direct-to-reverberant ratio and a normalized version of the principal eigenvector of the spatial covariance matrix at each time-frequency bin on the spectrograms. Experimental results on the DCASE 2021 dataset for sound event localization and detection with directional interference showed that the deep learning-based models trained on this new feature outperformed the DCASE challenge baseline by a large margin. We combined several models with slightly different architectures that were trained on the new feature to further improve the system performances for the DCASE sound event localization and detection challenge.
分类|识别(6篇)
【1】 Classification of Consumer Belief Statements From Social Media 标题:社交媒体中消费者信念声明的分类
作者:Gerhard Hagerer,Wenbin Le,Hannah Danner,Georg Groh 机构:Social Computing Research Group, Technical University of Munich, Chair of Marketing and Consumer Research, Technical University of Munich 链接:https://arxiv.org/abs/2106.15498 摘要:社交媒体提供大量的信息进行市场调查,以满足客户的需求。进行这项研究的一种方法是,领域专家将用户生成的内容收集并分类为复杂的细粒度类结构。在许多这样的情况下,很少的数据会遇到复杂的注释。目前还不完全了解如何将其成功地用于分类。我们检验了专家标签与a)许多细粒度类和b)少数抽象类一起使用时的分类精度。对于场景b),我们比较了领域专家给出的抽象类标签作为基线和自动分层聚类。我们将此与另一个基线进行比较,其中整个类结构是由完全无监督的聚类方法给出的。通过这样做,这项工作可以作为一个例子,说明如何复杂的专家注释是潜在的有益的,并可以利用在高度特定领域的意见挖掘的最佳方式。通过对一系列技术和实验的探索,我们发现自动类抽象方法,特别是无监督方法,在文本分类任务上比领域专家基线表现得非常好。这有可能激发意见挖掘应用程序,以便在实践中支持市场研究人员,并激发大规模的细粒度自动内容分析。 摘要:Social media offer plenty of information to perform market research in order to meet the requirements of customers. One way how this research is conducted is that a domain expert gathers and categorizes user-generated content into a complex and fine-grained class structure. In many of such cases, little data meets complex annotations. It is not yet fully understood how this can be leveraged successfully for classification. We examine the classification accuracy of expert labels when used with a) many fine-grained classes and b) few abstract classes. For scenario b) we compare abstract class labels given by the domain expert as baseline and by automatic hierarchical clustering. We compare this to another baseline where the entire class structure is given by a completely unsupervised clustering approach. By doing so, this work can serve as an example of how complex expert annotations are potentially beneficial and can be utilized in the most optimal way for opinion mining in highly specific domains. By exploring across a range of techniques and experiments, we find that automated class abstraction approaches in particular the unsupervised approach performs remarkably well against domain expert baseline on text classification tasks. This has the potential to inspire opinion mining applications in order to support market researchers in practice and to inspire fine-grained automated content analysis on a large scale.
【2】 A Bytecode-based Approach for Smart Contract Classification 标题:一种基于字节码的智能合约分类方法
作者:Chaochen Shi,Yong Xiang,Robin Ram Mohan Doss,Jiangshan Yu,Keshav Sood,Longxiang Gao 机构: Senior Member, IEEE 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2106.15497 摘要:随着区块链技术的发展,部署在区块链平台上的智能合约数量呈指数级增长,这使得用户很难通过人工筛选找到想要的服务。智能合约的自动分类可以为区块链用户提供基于关键字的合约搜索,有助于有效管理智能合约。目前对智能合约分类的研究主要集中在基于合约源代码的自然语言处理(NLP)解决方案上。然而,超过94%的智能合约不是开源的,因此NLP方法的应用场景非常有限。同时,NLP模型容易受到对手攻击。为了解决这些问题,本文提出了一种基于契约字节码特征的分类模型。我们还使用特征选择和集成学习来优化模型。我们对3300多个真实以太坊智能合约的实验研究表明,我们的模型可以在没有源代码的情况下对智能合约进行分类,并且比基准模型具有更好的性能。与基于NLP的模型相比,该模型还具有很好的抗攻击能力。此外,我们的分析表明,许多智能合约分类模型中使用的账户特征对分类的影响很小,可以排除。 摘要:With the development of blockchain technologies, the number of smart contracts deployed on blockchain platforms is growing exponentially, which makes it difficult for users to find desired services by manual screening. The automatic classification of smart contracts can provide blockchain users with keyword-based contract searching and helps to manage smart contracts effectively. Current research on smart contract classification focuses on Natural Language Processing (NLP) solutions which are based on contract source code. However, more than 94% of smart contracts are not open-source, so the application scenarios of NLP methods are very limited. Meanwhile, NLP models are vulnerable to adversarial attacks. This paper proposes a classification model based on features from contract bytecode instead of source code to solve these problems. We also use feature selection and ensemble learning to optimize the model. Our experimental studies on over 3,300 real-world Ethereum smart contracts show that our model can classify smart contracts without source code and has better performance than baseline models. Our model also has good resistance to adversarial attacks compared with NLP-based models. In addition, our analysis reveals that account features used in many smart contract classification models have little effect on classification and can be excluded.
【3】 Explaining the Performance of Multi-label Classification Methods with Data Set Properties 标题:用数据集属性解释多标签分类方法的性能
作者:Jasmin Bogatinovski,Ljupčo Todorovski,Sašo Džeroski,Dragi Kocev 机构:Joˇzef Stefan Institute, Ljubljana, Slovenia, Joˇzef Stefan IPSchool, Ljubljana, Slovenia, Dept. of Distributed Operating Systems, TU Berlin, Germany, University of Ljubljana, Slovenia, Bias Variance Labs, Ljubljana, Slovenia 链接:https://arxiv.org/abs/2106.15411 摘要:元学习概括了不同学习任务的经验经验,为机器学习算法的行为提供了重要的经验启示。在本文中,我们提出了一个全面的元学习研究的数据集和方法的多标签分类(MLC)。MLC是一个实际相关的机器学习任务,其中每个示例同时被标记为多个标签。在这里,我们使用50个描述数据不同属性的元特征来分析40个MLC数据集。本研究的主要发现如下。首先,描述MLC数据集空间的最突出的元特征是评估标签空间的不同方面的元特征。其次,元模型表明,最重要的元特征描述了标签空间,并且,描述标签之间关系的元特征往往比描述单个标签之间和内部分布的元特征更频繁地出现。第三,超参数的优化可以提高预测性能,然而,通常改进的程度并不总是证明资源利用率的合理性。 摘要:Meta learning generalizes the empirical experience with different learning tasks and holds promise for providing important empirical insight into the behaviour of machine learning algorithms. In this paper, we present a comprehensive meta-learning study of data sets and methods for multi-label classification (MLC). MLC is a practically relevant machine learning task where each example is labelled with multiple labels simultaneously. Here, we analyze 40 MLC data sets by using 50 meta features describing different properties of the data. The main findings of this study are as follows. First, the most prominent meta features that describe the space of MLC data sets are the ones assessing different aspects of the label space. Second, the meta models show that the most important meta features describe the label space, and, the meta features describing the relationships among the labels tend to occur a bit more often than the meta features describing the distributions between and within the individual labels. Third, the optimization of the hyperparameters can improve the predictive performance, however, quite often the extent of the improvements does not always justify the resource utilization.
【4】 Similarity Embedding Networks for Robust Human Activity Recognition 标题:基于相似嵌入网络的鲁棒人体活动识别
作者:Chenglin Li,Carrie Lu Tong,Di Niu,Bei Jiang,Xiao Zuo,Lei Cheng,Jian Xiong,Jianming Yang 机构: University of Alberta, University ofAlberta 链接:https://arxiv.org/abs/2106.15283 摘要:基于传感器数据的人类活动识别(HAR)深度学习模型是近年来研究的热点。然而,由于难以获得高质量的标记活动数据,深部模型对复杂现实世界HAR数据的泛化能力受到限制。本文设计了一种相似嵌入神经网络,通过精心设计的卷积层和LSTM层将传感器输入信号映射到实向量上。嵌入网络采用两两相似性损失训练,鼓励在嵌入的真实空间中对同一类的样本进行聚类,并且可以在小样本集上甚至在带有错误标记样本的噪声数据集上进行有效的训练。在此基础上,进一步提出了非参数和参数的活动识别方法。基于两个公共数据集的广泛评估表明,所提出的相似性嵌入网络在HAR分类任务上显著优于现有的深度模型,对训练集中的错误标记样本具有鲁棒性,并且可以有效地去除噪声数据集。 摘要:Deep learning models for human activity recognition (HAR) based on sensor data have been heavily studied recently. However, the generalization ability of deep models on complex real-world HAR data is limited by the availability of high-quality labeled activity data, which are hard to obtain. In this paper, we design a similarity embedding neural network that maps input sensor signals onto real vectors through carefully designed convolutional and LSTM layers. The embedding network is trained with a pairwise similarity loss, encouraging the clustering of samples from the same class in the embedded real space, and can be effectively trained on a small dataset and even on a noisy dataset with mislabeled samples. Based on the learned embeddings, we further propose both nonparametric and parametric approaches for activity recognition. Extensive evaluation based on two public datasets has shown that the proposed similarity embedding network significantly outperforms state-of-the-art deep models on HAR classification tasks, is robust to mislabeled samples in the training set, and can also be used to effectively denoise a noisy dataset.
【5】 ElephantBook: A Semi-Automated Human-in-the-Loop System for Elephant Re-Identification 标题:ElephantBook:一种用于大象再识别的半自动人在环系统
作者:Peter Kulits,Jake Wall,Anka Bedetti,Michelle Henley,Sara Beery 机构:Individual ID and, SEEK Coding, Sighting Images &, Matched Boxes, EarthRanger, Dashboard, ElephantBook, Sex: Male, Age: , Right Tusk: Yes, Left Tusk: Yes, Right Prominent Tear:, Quadrant , Right Secondary Tear:, ., SEEK Code:, b,T,E,?,-, X,S 链接:https://arxiv.org/abs/2106.15083 摘要:非洲象对它们的生态系统至关重要,但它们的种群正受到人象冲突和偷猎上升的威胁。监测种群动态对保护工作至关重要;然而,追踪大象是一项困难的任务,通常依靠GPS项圈的侵入性,有时甚至是危险的安置。尽管最近在使用计算机视觉技术自动识别其他物种方面取得了许多成功,但识别大象是极其困难的,通常需要专业知识以及熟悉种群中的大象。我们已经建立并部署了一个基于网络的平台和数据库,将人工属性标记和最先进的计算机视觉算法(ElephantBook)结合起来,对大象进行人在回路的重新识别。我们的系统目前正在马拉大象项目中使用,帮助监测大马赛马拉生态系统中受保护和有风险的大象种群。ElephantBook使大象的再鉴定可供非专家使用,并可扩展为多个保护非政府组织使用。 摘要:African elephants are vital to their ecosystems, but their populations are threatened by a rise in human-elephant conflict and poaching. Monitoring population dynamics is essential in conservation efforts; however, tracking elephants is a difficult task, usually relying on the invasive and sometimes dangerous placement of GPS collars. Although there have been many recent successes in the use of computer vision techniques for automated identification of other species, identification of elephants is extremely difficult and typically requires expertise as well as familiarity with elephants in the population. We have built and deployed a web-based platform and database for human-in-the-loop re-identification of elephants combining manual attribute labeling and state-of-the-art computer vision algorithms, known as ElephantBook. Our system is currently in use at the Mara Elephant Project, helping monitor the protected and at-risk population of elephants in the Greater Maasai Mara ecosystem. ElephantBook makes elephant re-identification usable by non-experts and scalable for use by multiple conservation NGOs.
【6】 Early Mobility Recognition for Intensive Care Unit Patients Using Accelerometers 标题:使用加速度计对重症监护病房患者的早期活动识别
作者:Rex Liu,Sarina A Fazio,Huanle Zhang,Albara Ah Ramli,Xin Liu,Jason Yeates Adams 机构:Department of Computer Science, University of California, Davis, Davis, California, USA, Department of Internal Medicine, Division of Pulmonary, Critical Care, and Sleep Medicine, Sacramento, California, USA 链接:https://arxiv.org/abs/2106.15017 摘要:随着物联网(IoT)和人工智能(AI)技术的发展,人类活动识别已经实现了智能家居和辅助生活等各种应用。在本文中,我们的目标是人类活动识别的一个新的医疗保健应用,重症监护病房(ICU)患者的早期活动识别。对于长期卧床的ICU患者,早期活动是必不可少的。我们的系统包括基于加速计的ICU病人数据采集和人工智能模型来识别病人的早期活动。为了提高模型的准确性和稳定性,我们识别了对传感器方向不敏感的特征,并提出了一种分段投票过程,利用多数投票策略来识别每个分段的活动。结果表明,与没有特征工程和分段投票过程的人工智能模型相比,该系统将模型精度从77.78%提高到81.86%,模型不稳定性(标准差)从16.69%降低到6.92%。 摘要:With the development of the Internet of Things(IoT) and Artificial Intelligence(AI) technologies, human activity recognition has enabled various applications, such as smart homes and assisted living. In this paper, we target a new healthcare application of human activity recognition, early mobility recognition for Intensive Care Unit(ICU) patients. Early mobility is essential for ICU patients who suffer from long-time immobilization. Our system includes accelerometer-based data collection from ICU patients and an AI model to recognize patients' early mobility. To improve the model accuracy and stability, we identify features that are insensitive to sensor orientations and propose a segment voting process that leverages a majority voting strategy to recognize each segment's activity. Our results show that our system improves model accuracy from 77.78\% to 81.86\% and reduces the model instability (standard deviation) from 16.69\% to 6.92\%, compared to the same AI model without our feature engineering and segment voting process.
表征(1篇)
【1】 Open-Set Representation Learning through Combinatorial Embedding 标题:基于组合嵌入的开集表示学习
作者:Geeho Kim,Bohyung Han 机构:Computer Vision Lab. & ASRI, Seoul National University 备注:12 pages, 4 figures 链接:https://arxiv.org/abs/2106.15278 摘要:视觉识别任务通常仅限于处理一小部分类,因为其余类的标签不可用。我们感兴趣的是通过基于标记类和未标记类的实例的表示学习来识别数据集中的新概念,并将识别范围扩展到已知类和新类。为了解决这个具有挑战性的任务,我们提出了一种组合学习方法,该方法利用多个有监督的元分类器在异构标签空间上给出的组合知识,自然地将示例聚类到不可见的类中。我们还引入了一种度量学习策略来估计成对伪标签,以改进未标记示例的表示,有效地保留了已知类和新类之间的语义关系。该算法通过联合优化来发现新的概念,增强未知类的可辨性,同时学习可归纳为新类的已知类的表示。我们的大量实验表明,该方法在多图像检索和新的类发现基准测试中取得了显著的性能改进。 摘要:Visual recognition tasks are often limited to dealing with a small subset of classes simply because the labels for the remaining classes are unavailable. We are interested in identifying novel concepts in a dataset through representation learning based on the examples in both labeled and unlabeled classes, and extending the horizon of recognition to both known and novel classes. To address this challenging task, we propose a combinatorial learning approach, which naturally clusters the examples in unseen classes using the compositional knowledge given by multiple supervised meta-classifiers on heterogeneous label spaces. We also introduce a metric learning strategy to estimate pairwise pseudo-labels for improving representations of unlabeled examples, which preserves semantic relations across known and novel classes effectively. The proposed algorithm discovers novel concepts via a joint optimization of enhancing the discrimitiveness of unseen classes as well as learning the representations of known classes generalizable to novel ones. Our extensive experiments demonstrate remarkable performance gains by the proposed approach in multiple image retrieval and novel class discovery benchmarks.
编码器(1篇)
【1】 On exploring practical potentials of quantum auto-encoder with advantages 标题:发挥量子自动编码器优势发掘实用潜力
作者:Yuxuan Du,Dacheng Tao 机构:JD Explore Academy 链接:https://arxiv.org/abs/2106.15432 摘要:量子自动编码器(QAE)是缓解量子物理中维数灾难的有力工具,它以能够从高维空间中的量子态中提取低维模式而著称。尽管QAE具有诱人的性质,但它的实际应用却鲜为人知。为了解决这些问题,我们证明了QAE可以有效地计算具有低秩性质的高维量子态的本征值和制备相应的本征向量。为此,我们设计了三种有效的基于QAE的学习协议,分别解决了低秩状态保真度估计、量子Gibbs态准备和量子计量任务。值得注意的是,所有这些协议都是可伸缩的,并且可以在短期量子机器上容易地执行。此外,我们还证明了所提出的基于QAE的方法的误差界优于以前的文献。数值模拟与理论分析相结合。我们的工作为利用QAE以可伸缩的方式解决各种量子物理和量子信息处理问题开辟了一条新的途径。 摘要:Quantum auto-encoder (QAE) is a powerful tool to relieve the curse of dimensionality encountered in quantum physics, celebrated by the ability to extract low-dimensional patterns from quantum states living in the high-dimensional space. Despite its attractive properties, little is known about the practical applications of QAE with provable advantages. To address these issues, here we prove that QAE can be used to efficiently calculate the eigenvalues and prepare the corresponding eigenvectors of a high-dimensional quantum state with the low-rank property. With this regard, we devise three effective QAE-based learning protocols to solve the low-rank state fidelity estimation, the quantum Gibbs state preparation, and the quantum metrology tasks, respectively. Notably, all of these protocols are scalable and can be readily executed on near-term quantum machines. Moreover, we prove that the error bounds of the proposed QAE-based methods outperform those in previous literature. Numerical simulations collaborate with our theoretical analysis. Our work opens a new avenue of utilizing QAE to tackle various quantum physics and quantum information processing problems in a scalable way.
优化|敛散性(8篇)
【1】 Attentive Neural Processes and Batch Bayesian Optimization for Scalable Calibration of Physics-Informed Digital Twins 标题:关注神经过程和批量贝叶斯优化在物理信息数字孪生可扩展校准中的应用
作者:Ankush Chakrabarty,Gordon Wichern,Christopher Laughman 备注:12 pages, accepted to ICML 2021 Workshop on Tackling Climate Change with Machine Learning 链接:https://arxiv.org/abs/2106.15502 摘要:以物理为基础的动态系统模型是构建环境的数字孪生体的关键组成部分。这些数字孪生子使节能基础设施的设计成为可能,但必须进行适当校准,以准确反映系统行为,以便进行下游预测和分析。现代建筑的动力系统模型通常由大量的参数来描述,在仿真过程中会产生大量的计算开销。为了在无需过度模拟的情况下处理数字孪生子的大规模校准,我们提出了ANP-BBO:一种利用注意力神经过程(ANP)的可扩展并行批处理贝叶斯优化(BBO)方法。 摘要:Physics-informed dynamical system models form critical components of digital twins of the built environment. These digital twins enable the design of energy-efficient infrastructure, but must be properly calibrated to accurately reflect system behavior for downstream prediction and analysis. Dynamical system models of modern buildings are typically described by a large number of parameters and incur significant computational expenditure during simulations. To handle large-scale calibration of digital twins without exorbitant simulations, we propose ANP-BBO: a scalable and parallelizable batch-wise Bayesian optimization (BBO) methodology that leverages attentive neural processes (ANPs).
【2】 An Efficient Batch Constrained Bayesian Optimization Approach for Analog Circuit Synthesis via Multi-objective Acquisition Ensemble 标题:基于多目标采集集成的模拟电路批量约束贝叶斯优化综合方法
作者:Shuhan Zhang,Fan Yang,Changhao Yan,Dian Zhou,Xuan Zeng 机构: Fudan University, Zhou is with the Department of Electrical Engineering, The Universityof Texas at Dallas 备注:14 pages, 5 figures 链接:https://arxiv.org/abs/2106.15412 摘要:贝叶斯优化是一种很有前途的模拟电路综合方法。然而,贝叶斯优化框架的连续性极大地限制了它充分利用现实世界计算资源的能力。本文提出了一种基于多目标捕获函数集成(MACE)的高效并行贝叶斯优化算法,进一步加快了优化过程。通过从改进概率(PI)、期望改进(EI)和置信下限(LCB)的帕累托前沿采样查询点,我们结合了最先进的采集函数的优点,在无约束优化问题的探索和开发之间实现了微妙的权衡。在此基础上,进一步调整了约束优化问题的算法。通过将优化过程分为两个阶段,首先寻找一个初始可行点,从而获得更多的有效区域信息,避免在不可行区域附近采样。在获得第一个可行点后,我们通过对捕获函数集合采用特殊设计的惩罚项来偏向可行域。实验结果定量地表明,对于批量为15的无约束优化问题,与差分进化(DE)算法相比,该算法可以减少74倍的仿真时间,当批量大小为15时,我们提出的算法比基于加权期望改进的贝叶斯优化(WEIBO)方法的优化速度提高了15倍。 摘要:Bayesian optimization is a promising methodology for analog circuit synthesis. However, the sequential nature of the Bayesian optimization framework significantly limits its ability to fully utilize real-world computational resources. In this paper, we propose an efficient parallelizable Bayesian optimization algorithm via Multi-objective ACquisition function Ensemble (MACE) to further accelerate the optimization procedure. By sampling query points from the Pareto front of the probability of improvement (PI), expected improvement (EI) and lower confidence bound (LCB), we combine the benefits of state-of-the-art acquisition functions to achieve a delicate tradeoff between exploration and exploitation for the unconstrained optimization problem. Based on this batch design, we further adjust the algorithm for the constrained optimization problem. By dividing the optimization procedure into two stages and first focusing on finding an initial feasible point, we manage to gain more information about the valid region and can better avoid sampling around the infeasible area. After achieving the first feasible point, we favor the feasible region by adopting a specially designed penalization term to the acquisition function ensemble. The experimental results quantitatively demonstrate that our proposed algorithm can reduce the overall simulation time by up to 74 times compared to differential evolution (DE) for the unconstrained optimization problem when the batch size is 15. For the constrained optimization problem, our proposed algorithm can speed up the optimization process by up to 15 times compared to the weighted expected improvement based Bayesian optimization (WEIBO) approach, when the batch size is 15.
【3】 Reliable and Fast Recurrent Neural Network Architecture Optimization 标题:可靠快速的递归神经网络结构优化
作者:Andrés Camero,Jamal Toutouh,Enrique Alba 机构:ITIS Software, Universidad de M´alaga, Malaga, Spain 链接:https://arxiv.org/abs/2106.15295 摘要:介绍了一种基于随机误差抽样的神经进化算法(RESN),它是一种新的递归神经网络结构自动优化方法。RESN结合了进化算法和无训练评估方法。结果表明,RESN在减少一半计算时间的同时,达到了最先进的误码性能。 摘要:This article introduces Random Error Sampling-based Neuroevolution (RESN), a novel automatic method to optimize recurrent neural network architectures. RESN combines an evolutionary algorithm with a training-free evaluation approach. The results show that RESN achieves state-of-the-art error performance while reducing by half the computational time.
【4】 Optimal Rates for Random Order Online Optimization 标题:随机订单在线优化的最优费率
作者:Uri Sherman,Tomer Koren,Yishay Mansour 链接:https://arxiv.org/abs/2106.15207 摘要:我们研究了\ciet{garber2020online}最近提出的随机序模型中的在线凸优化问题,其中损失函数可以由对手选择,但随后以一致随机序呈现给在线算法。针对累积损失函数是(强)凸的,而单个损失函数是光滑的,但可能是非凸的情况,我们给出了达到最优界的算法,并显著优于\ciet{garber2020online},完全消除了维数依赖,提高了它们对强凸性参数的标度。我们的分析依赖于无替换采样的算法稳定性和泛化之间的新联系,类似于有替换i.i.d.~设置中的研究,以及随机梯度下降的精细平均稳定性分析。 摘要:We study online convex optimization in the random order model, recently proposed by \citet{garber2020online}, where the loss functions may be chosen by an adversary, but are then presented to the online algorithm in a uniformly random order. Focusing on the scenario where the cumulative loss function is (strongly) convex, yet individual loss functions are smooth but might be non-convex, we give algorithms that achieve the optimal bounds and significantly outperform the results of \citet{garber2020online}, completely removing the dimension dependence and improving their scaling with respect to the strong convexity parameter. Our analysis relies on novel connections between algorithmic stability and generalization for sampling without-replacement analogous to those studied in the with-replacement i.i.d.~setting, as well as on a refined average stability analysis of stochastic gradient descent.
【5】 End-to-end Waveform Learning Through Joint Optimization of Pulse and Constellation Shaping 标题:基于脉冲和星座成形联合优化的端到端波形学习
作者:Fayçal Ait Aoudia,Jakob Hoydis 机构:∗Nokia Bell Labs, Paris-Saclay, Nozay, France, ‡NVIDIA, Sophia Antipolis, France 链接:https://arxiv.org/abs/2106.15158 摘要:随着通信系统被预见到能够实现新的服务,例如联合通信和传感,并利用部分亚太赫兹频谱,能够支持这些新兴应用的新型波形的设计变得越来越具有挑战性。在这项工作中,我们提出了一个端到端的学习方法来设计波形,通过联合学习脉冲成形和星座几何,以及一个基于神经网络(NN)的接收器。在满足带外发射和功率包络约束的前提下,对系统进行优化,使信息传输速率达到最大。我们的结果表明,该方法能使相邻信道泄漏率(ACLR)减小几个数量级,峰值平均功率比(PAPR)与传统滤波器相比具有竞争力,而在加性高斯白噪声(AWGN)信道上,信息率没有显著损失,在发射器上没有额外的复杂性。 摘要:As communication systems are foreseen to enable new services such as joint communication and sensing and utilize parts of the sub-THz spectrum, the design of novel waveforms that can support these emerging applications becomes increasingly challenging. We present in this work an end-to-end learning approach to design waveforms through joint learning of pulse shaping and constellation geometry, together with a neural network (NN)-based receiver. Optimization is performed to maximize an achievable information rate, while satisfying constraints on out-of-band emission and power envelope. Our results show that the proposed approach enables up to orders of magnitude smaller adjacent channel leakage ratios (ACLRs) with peak-to-average power ratios (PAPRs) competitive with traditional filters, without significant loss of information rate on an additive white Gaussian noise (AWGN) channel, and no additional complexity at the transmitter.
【6】 Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction 标题:小随机初始化类似于谱学习:过参数化低秩矩阵重构的优化和泛化保证
作者:Dominik Stöger,Mahdi Soltanolkotabi 机构:Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California 备注:80 pages 链接:https://arxiv.org/abs/2106.15013 摘要:近年来,对于基于梯度的超参数化非凸损失方法的收敛性和推广性的研究取得了重要的理论进展。然而,许多方面的优化和推广,特别是小随机初始化的关键作用还没有完全了解。在本文中,我们通过证明小随机初始化和几个梯度下降迭代的行为类似于流行的谱方法,朝着揭开这个角色的神秘面纱迈出了一步。我们还表明,这种来自小随机初始化的隐式谱偏差,对于过参数化模型更为显著,也使梯度下降迭代在特定的轨迹上朝向不仅全局最优而且具有良好泛化性的解。具体地说,我们关注的问题是通过一个自然的非凸公式从几个测量值重建一个低秩矩阵。在这种情况下,我们证明了从小的随机初始化开始的梯度下降迭代的轨迹可以近似地分解为三个阶段:(I)谱或对齐阶段,其中我们表明迭代具有类似于谱初始化的隐式谱偏差,这使得我们可以证明在这个阶段结束时,列空间迭代和底层低秩矩阵充分对齐,(II)一个鞍回避/细化阶段,我们表明梯度迭代的轨迹远离某些退化鞍点,以及(III)局部细化阶段,我们证明在避免鞍点之后,迭代快速收敛到底层低秩矩阵。我们分析的基础是对超参数非凸优化方案的分析,这些方案可能对低秩重建以外的计算问题有影响。 摘要:Recently there has been significant theoretical progress on understanding the convergence and generalization of gradient-based methods on nonconvex losses with overparameterized models. Nevertheless, many aspects of optimization and generalization and in particular the critical role of small random initialization are not fully understood. In this paper, we take a step towards demystifying this role by proving that small random initialization followed by a few iterations of gradient descent behaves akin to popular spectral methods. We also show that this implicit spectral bias from small random initialization, which is provably more prominent for overparameterized models, also puts the gradient descent iterations on a particular trajectory towards solutions that are not only globally optimal but also generalize well. Concretely, we focus on the problem of reconstructing a low-rank matrix from a few measurements via a natural nonconvex formulation. In this setting, we show that the trajectory of the gradient descent iterations from small random initialization can be approximately decomposed into three phases: (I) a spectral or alignment phase where we show that that the iterates have an implicit spectral bias akin to spectral initialization allowing us to show that at the end of this phase the column space of the iterates and the underlying low-rank matrix are sufficiently aligned, (II) a saddle avoidance/refinement phase where we show that the trajectory of the gradient iterates moves away from certain degenerate saddle points, and (III) a local refinement phase where we show that after avoiding the saddles the iterates converge quickly to the underlying low-rank matrix. Underlying our analysis are insights for the analysis of overparameterized nonconvex optimization schemes that may have implications for computational problems beyond low-rank reconstruction.
【7】 Towards Sample-Optimal Compressive Phase Retrieval with Sparse and Generative Priors 标题:基于稀疏生成先验的样本最优压缩相位检索
作者:Zhaoqiang Liu,Subhroshekhar Ghosh,Jonathan Scarlett 机构: Ghosh is with the Department of Mathematics, National University of Singapore (email 链接:https://arxiv.org/abs/2106.15358 摘要:压缩相位恢复是标准压缩感知问题的一个流行变体,其中测量值仅包含幅度信息。在这篇论文中,基于深层生成模型的最新进展,我们为具有生成先验的相位恢复提供了顺序最优样本复杂度边界的恢复保证。我们首先证明,当使用i.i.d.高斯测量和$L$-Lipschitz连续生成模型(有界$k$-维输入)时,大约$O(k\logL)$样本足以保证信号接近任何使基于振幅的经验损失函数最小化的向量。用一种实用的算法来获得这个样本复杂度仍然是一个困难的挑战,而一种流行的光谱初始化方法被认为是一个主要的瓶颈。为了部分地解决这个问题,我们进一步证明了大约$O(k\logl)$个样本确保了信号和为谱初始化设计的优化问题的任何{\em全局最优}解之间的足够接近性(尽管找到这样的解仍然是一个挑战)。我们将这一结果应用于稀疏相位恢复,并证明当底层信号为$s$稀疏和$n$维时,$O(s\logn)$样本足以满足类似的保证,匹配信息论下限。虽然我们的保证并不直接对应于一个实际的算法,我们提出了一个实用的光谱初始化方法的动机是我们的研究结果,并通过实验观察到显着的性能增益比现有的各种光谱初始化方法稀疏相位检索。 摘要:Compressive phase retrieval is a popular variant of the standard compressive sensing problem, in which the measurements only contain magnitude information. In this paper, motivated by recent advances in deep generative models, we provide recovery guarantees with order-optimal sample complexity bounds for phase retrieval with generative priors. We first show that when using i.i.d. Gaussian measurements and an $L$-Lipschitz continuous generative model with bounded $k$-dimensional inputs, roughly $O(k \log L)$ samples suffice to guarantee that the signal is close to any vector that minimizes an amplitude-based empirical loss function. Attaining this sample complexity with a practical algorithm remains a difficult challenge, and a popular spectral initialization method has been observed to pose a major bottleneck. To partially address this, we further show that roughly $O(k \log L)$ samples ensure sufficient closeness between the signal and any {\em globally optimal} solution to an optimization problem designed for spectral initialization (though finding such a solution may still be challenging). We adapt this result to sparse phase retrieval, and show that $O(s \log n)$ samples are sufficient for a similar guarantee when the underlying signal is $s$-sparse and $n$-dimensional, matching an information-theoretic lower bound. While our guarantees do not directly correspond to a practical algorithm, we propose a practical spectral initialization method motivated by our findings, and experimentally observe significant performance gains over various existing spectral initialization methods of sparse phase retrieval.
【8】 Robust Distributed Optimization With Randomly Corrupted Gradients 标题:具有随机破坏梯度的鲁棒分布式优化
作者:Berkay Turan,Cesar A. Uribe,Hoi-To Wai,Mahnoosh Alizadeh 机构:C´esar A. Uribe 备注:17 pages, 3 figures, submitted to IEEE TSP 链接:https://arxiv.org/abs/2106.14956 摘要:在本文中,我们提出了一个一阶分布式优化算法,该算法对拜占庭失败具有很强的鲁棒性,其中所有参与的代理都容易失败。我们将每个代理的状态建模为一个两状态马尔可夫链,表示不同时刻的拜占庭行为或可信行为。我们在任何时候都不限制拜占庭特工的最大数量。我们设计的方法基于三层防御:1)时间梯度平均,2)鲁棒聚合,3)梯度归一化。我们研究了随机优化的两种设置,即样本平均逼近和随机逼近,证明了对于强凸和光滑非凸代价函数,我们的算法获得了阶最优的统计误差和收敛速度。 摘要:In this paper, we propose a first-order distributed optimization algorithm that is provably robust to Byzantine failures-arbitrary and potentially adversarial behavior, where all the participating agents are prone to failure. We model each agent's state over time as a two-state Markov chain that indicates Byzantine or trustworthy behaviors at different time instants. We set no restrictions on the maximum number of Byzantine agents at any given time. We design our method based on three layers of defense: 1) Temporal gradient averaging, 2) robust aggregation, and 3) gradient normalization. We study two settings for stochastic optimization, namely Sample Average Approximation and Stochastic Approximation, and prove that for strongly convex and smooth non-convex cost functions, our algorithm achieves order-optimal statistical error and convergence rates.
预测|估计(9篇)
【1】 SpreadsheetCoder: Formula Prediction from Semi-structured Context 标题:SpreadsheetCoder:基于半结构化上下文的公式预测
作者:Xinyun Chen,Petros Maniatis,Rishabh Singh,Charles Sutton,Hanjun Dai,Max Lin,Denny Zhou 备注:Published in ICML 2021 链接:https://arxiv.org/abs/2106.15339 摘要:电子表格公式预测是一个重要的程序综合问题,在许多实际应用中有着广泛的应用。以前的工作通常使用输入-输出示例作为电子表格公式合成的规范,其中每个输入-输出对模拟电子表格中的一个单独行。然而,这个公式并不能完全捕捉真实世界电子表格中的丰富背景。首先,电子表格数据项被组织成表格,因此行和列不一定相互独立。此外,许多电子表格都包含标题,这些标题提供了单元格数据的高级描述。但是,以前的合成方法不将头作为规范的一部分。在这项工作中,我们提出了第一种方法合成电子表格公式从表格的背景,其中包括头部和半结构化表格数据。特别地,我们提出了电子表格编码,一个基于BERT的模型架构,用行和列两种格式来表示表格上下文。我们在一个大型的电子表格数据集上训练了我们的模型,并证明电子表格编码器达到了42.51%的top-1预测精度,这比不采用丰富表格上下文的基线有了很大的改进。与基于规则的系统相比,SpreadsheetCoder帮助82%以上的用户在googlesheets上编写公式。 摘要:Spreadsheet formula prediction has been an important program synthesis problem with many real-world applications. Previous works typically utilize input-output examples as the specification for spreadsheet formula synthesis, where each input-output pair simulates a separate row in the spreadsheet. However, this formulation does not fully capture the rich context in real-world spreadsheets. First, spreadsheet data entries are organized as tables, thus rows and columns are not necessarily independent from each other. In addition, many spreadsheet tables include headers, which provide high-level descriptions of the cell data. However, previous synthesis approaches do not consider headers as part of the specification. In this work, we present the first approach for synthesizing spreadsheet formulas from tabular context, which includes both headers and semi-structured tabular data. In particular, we propose SpreadsheetCoder, a BERT-based model architecture to represent the tabular context in both row-based and column-based formats. We train our model on a large dataset of spreadsheets, and demonstrate that SpreadsheetCoder achieves top-1 prediction accuracy of 42.51%, which is a considerable improvement over baselines that do not employ rich tabular context. Compared to the rule-based system, SpreadsheetCoder assists 82% more users in composing formulas on Google Sheets.
【2】 Soft Attention: Does it Actually Help to Learn Social Interactions in Pedestrian Trajectory Prediction? 标题:软注意:在行人轨迹预测中学习社会互动真的有帮助吗?
作者:Laurent Boucaud,Daniel Aloise,Nicolas Saunier 机构: Boucaud was with the Department of Computer Engineering, Aloise is with the Department of Computer Engineering 备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 链接:https://arxiv.org/abs/2106.15321 摘要:我们考虑利用行人的运动历史和周围行人的运动历史(称为社会信息)预测行人未来路径的问题。自从关于社会LSTM的开创性论文发表以来,深度学习已经成为用来模拟社会互动对行人运动影响的主要工具。这些模型能够学习社会互动的证明依赖于对这些模型的深入研究。通过两个标准度量,即平均位移误差和最终位移误差,比较了有无社会交互模块的模型。然而,这些复杂的模型最近被简单的等速方法所超越。这个问题是,他们是否真的允许社会互动模型以及证据的有效性。本文主要研究具有软注意机制的深度学习模型在社会互动建模中的应用,并研究其是否在预测时使用社会信息。我们在ETH和UCY的数据集上进行了两个实验,这些数据集在以前的工作中也使用过。首先,用随机噪声代替社会信息对模型进行训练,并与实际社会信息训练的模型进行比较。第二,我们使用了一个门控机制和一个$L\u 0$的惩罚,允许模型关闭它们的内部组件。这些模型不断地学习删减他们的软注意机制。两个实验的收敛过程和预测性能都没有改变。这说明模型忽略了软注意机制和社会信息。 摘要:We consider the problem of predicting the future path of a pedestrian using its motion history and the motion history of the surrounding pedestrians, called social information. Since the seminal paper on Social-LSTM, deep-learning has become the main tool used to model the impact of social interactions on a pedestrian's motion. The demonstration that these models can learn social interactions relies on an ablative study of these models. The models are compared with and without their social interactions module on two standard metrics, the Average Displacement Error and Final Displacement Error. Yet, these complex models were recently outperformed by a simple constant-velocity approach. This questions if they actually allow to model social interactions as well as the validity of the proof. In this paper, we focus on the deep-learning models with a soft-attention mechanism for social interaction modeling and study whether they use social information at prediction time. We conduct two experiments across four state-of-the-art approaches on the ETH and UCY datasets, which were also used in previous work. First, the models are trained by replacing the social information with random noise and compared to model trained with actual social information. Second, we use a gating mechanism along with a $L_0$ penalty, allowing models to shut down their inner components. The models consistently learn to prune their soft-attention mechanism. For both experiments, neither the course of the convergence nor the prediction performance were altered. This demonstrates that the soft-attention mechanism and therefore the social information are ignored by the models.
【3】 Convolutional Sparse Coding Fast Approximation with Application to Seismic Reflectivity Estimation 标题:卷积稀疏编码快速逼近及其在地震反射率估计中的应用
作者:Deborah Pereg,Israel Cohen,Anthony A. Vassiliou 链接:https://arxiv.org/abs/2106.15296 摘要:在稀疏编码中,我们试图提取输入向量的特征,假设数据本身是基本构造块的稀疏叠加。类似地,神经网络通过学习训练数据集的特征来执行给定的任务。近年来,数据驱动和模型驱动的特征提取方法得到了广泛的应用,并取得了显著的效果。然而,实际实现往往太慢,无法在实际场景中使用,特别是对于实时应用程序。我们提出了一个经典迭代阈值算法的加速升级版本,在2-5次迭代中产生了卷积稀疏码的良好逼近。速度优势主要来自于观察到大多数解算器都被低效的全局阈值降低了速度。其主要思想是在应用阈值之前,通过局部感受野能量对每个数据点进行归一化。通过这种方式,可以抑制对强特征表达式的自然倾向,从而可以依赖于在训练期间容易逼近或学习的全局阈值。所提出的算法可以用于已知的预定词典,也可以用于经过训练的词典。训练后的版本被实现为一个神经网络,作为所提出的解算器的展开。通过在合成和真实数据情况下的地震反演问题,验证了该方法的有效性。为稳定的支护回收提供了理论保证。也就是说,我们证明了在一定条件下,真正的支持是完全恢复在第一次迭代。 摘要:In sparse coding, we attempt to extract features of input vectors, assuming that the data is inherently structured as a sparse superposition of basic building blocks. Similarly, neural networks perform a given task by learning features of the training data set. Recently both data-driven and model-driven feature extracting methods have become extremely popular and have achieved remarkable results. Nevertheless, practical implementations are often too slow to be employed in real-life scenarios, especially for real-time applications. We propose a speed-up upgraded version of the classic iterative thresholding algorithm, that produces a good approximation of the convolutional sparse code within 2-5 iterations. The speed advantage is gained mostly from the observation that most solvers are slowed down by inefficient global thresholding. The main idea is to normalize each data point by the local receptive field energy, before applying a threshold. This way, the natural inclination towards strong feature expressions is suppressed, so that one can rely on a global threshold that can be easily approximated, or learned during training. The proposed algorithm can be employed with a known predetermined dictionary, or with a trained dictionary. The trained version is implemented as a neural net designed as the unfolding of the proposed solver. The performance of the proposed solution is demonstrated via the seismic inversion problem in both synthetic and real data scenarios. We also provide theoretical guarantees for a stable support recovery. Namely, we prove that under certain conditions the true support is perfectly recovered within the first iteration.
【4】 Predicting the Solar Potential of Rooftops using Image Segmentation and Structured Data 标题:基于图像分割和结构化数据的屋顶太阳能潜力预测
作者:Daniel de Barros Soares,François Andrieux,Bastien Hell,Julien Lenhardt,Jordi Badosa,Sylvain Gavoille,Stéphane Gaiffas,Emmanuel Bacry 机构:namR, Paris, France, ENSTA Paris, LMD, Ecole polytechnique, IP Paris, Palaiseau, France, LPSM, Université de Paris, DMA, Ecole normale supérieure, CEREMADE, Université Paris Dauphine 链接:https://arxiv.org/abs/2106.15268 摘要:估算屋顶光伏发电系统的发电量是一个耗时的过程,需要现场测量,这是一项难以大规模实现的任务。在本文中,我们提出了一种方法来估计屋顶太阳能潜力的基础上,他们的位置和建筑特点,以及他们每年收到的太阳辐射量。该方法一方面利用计算机视觉实现屋顶截面和屋顶对象的语义分割,另一方面利用基于结构化建筑特征的机器学习模型预测屋顶坡度。然后,我们用几何方法计算了可以安装在屋顶上的太阳能电池板的方位角和最大数量。最后,我们计算出精确的遮光掩模,并将其与太阳辐射数据相结合,使我们能够估计屋顶的年太阳能潜力。 摘要:Estimating the amount of electricity that can be produced by rooftop photovoltaic systems is a time-consuming process that requires on-site measurements, a difficult task to achieve on a large scale. In this paper, we present an approach to estimate the solar potential of rooftops based on their location and architectural characteristics, as well as the amount of solar radiation they receive annually. Our technique uses computer vision to achieve semantic segmentation of roof sections and roof objects on the one hand, and a machine learning model based on structured building features to predict roof pitch on the other hand. We then compute the azimuth and maximum number of solar panels that can be installed on a rooftop with geometric approaches. Finally, we compute precise shading masks and combine them with solar irradiation data that enables us to estimate the yearly solar potential of a rooftop.
【5】 Convolutional Hypercomplex Embeddings for Link Prediction 标题:用于链路预测的卷积超复数嵌入
作者:Caglar Demir,Diego Moussallem,Stefan Heindorf,Axel-Cyrille Ngonga Ngomo 机构:Data Science Research Group, Paderborn University, Globo, Rio de Janeiro, Brazil 链接:https://arxiv.org/abs/2106.15230 摘要:知识图嵌入的研究主要集中在两个最小的赋范除代数$\mathbb{R}$和$\mathbb{C}$。最近的研究结果表明,四元数值嵌入的三线性积是解决链路预测问题的一种更有效的方法。此外,基于实值嵌入的卷积模型通常会产生最新的链路预测结果。本文研究了一类具有超复数乘法的卷积运算的组合。我们提出了QMult、ommult、ConvQ和conva四种方法来解决链路预测问题。QMult和ommult可以看作是以前最先进方法的四元数和八元数扩展,包括DistMult和ComplEx。ConvQ和conva建立在QMult和OMult的基础上,以一种受剩余学习框架启发的方式包含卷积运算。我们在七个链路预测数据集(包括WN18RR、FB15K-237和YAGO3-10)上评估了我们的方法。实验结果表明,学习超复杂值向量表示的好处随着知识图的大小和复杂性的增加而变得更加明显。在MRR中,Conva的性能优于FB15K-237的最先进方法,Hit@1 以及Hit@3QMult、OMult、ConvQ和conva在所有指标上均优于YAGO3-10上的状态估计方法。结果还表明,通过预测平均可以进一步提高链路预测性能。为了促进可复制的研究,我们提供了一个开源的方法实现,包括训练和评估脚本以及预先训练的模型。 摘要:Knowledge graph embedding research has mainly focused on the two smallest normed division algebras, $\mathbb{R}$ and $\mathbb{C}$. Recent results suggest that trilinear products of quaternion-valued embeddings can be a more effective means to tackle link prediction. In addition, models based on convolutions on real-valued embeddings often yield state-of-the-art results for link prediction. In this paper, we investigate a composition of convolution operations with hypercomplex multiplications. We propose the four approaches QMult, OMult, ConvQ and ConvO to tackle the link prediction problem. QMult and OMult can be considered as quaternion and octonion extensions of previous state-of-the-art approaches, including DistMult and ComplEx. ConvQ and ConvO build upon QMult and OMult by including convolution operations in a way inspired by the residual learning framework. We evaluated our approaches on seven link prediction datasets including WN18RR, FB15K-237 and YAGO3-10. Experimental results suggest that the benefits of learning hypercomplex-valued vector representations become more apparent as the size and complexity of the knowledge graph grows. ConvO outperforms state-of-the-art approaches on FB15K-237 in MRR, Hit@1 and Hit@3, while QMult, OMult, ConvQ and ConvO outperform state-of-the-approaches on YAGO3-10 in all metrics. Results also suggest that link prediction performances can be further improved via prediction averaging. To foster reproducible research, we provide an open-source implementation of approaches, including training and evaluation scripts as well as pretrained models.
【6】 Regularized OFU: an Efficient UCB Estimator forNon-linear Contextual Bandit 标题:正则化OFU:一种有效的非线性上下文盗贼的UCB估计器
作者:Yichi Zhou,Shihong Song,Huishuai Zhang,Jun Zhu,Wei Chen,Tie-Yan Liu 机构:com†Tsinghua University 链接:https://arxiv.org/abs/2106.15128 摘要:勘探与开发的平衡是当代土匪斗争中的一个基本问题。面对不确定性,EE的一个强有力的权衡原则是等时性(OFU),即agent根据奖励的置信上限(UCB)采取行动。OFU已经达到了(接近)最佳的遗憾边界,适合于linear/kernel上下文盗贼。然而,如何在非线性复杂任务中得到有效的EE折衷方法,如以深层神经网络作为奖励函数的上下文bandit,目前还不清楚。本文提出了一种新的OFU算法ROFU。在rofu中,我们用一个可微函数来度量报酬的不确定性,并通过求解一个正则优化问题来计算置信上界。我们证明了,对于多武装土匪,核上下文土匪和神经中枢核土匪,ROFU在一定的不确定性测度下达到(接近)最优后悔界,这从理论上证明了它在EE权衡上的有效性。重要的是,ROFU承认了一个非常有效的梯度基优化实现,它很容易扩展到一般的深度神经网络模型以外的神经切线核,与以往的OFU方法形成鲜明对比。理论评估表明,在不同的环境下,ROFU对语境条件非常有效。 摘要:Balancing exploration and exploitation (EE) is a fundamental problem in contex-tual bandit. One powerful principle for EE trade-off isOptimism in Face of Uncer-tainty(OFU), in which the agent takes the action according to an upper confidencebound (UCB) of reward. OFU has achieved (near-)optimal regret bound for lin-ear/kernel contextual bandits. However, it is in general unknown how to deriveefficient and effective EE trade-off methods for non-linearcomplex tasks, suchas contextual bandit with deep neural network as the reward function. In thispaper, we propose a novel OFU algorithm namedregularized OFU(ROFU). InROFU, we measure the uncertainty of the reward by a differentiable function andcompute the upper confidence bound by solving a regularized optimization prob-lem. We prove that, for multi-armed bandit, kernel contextual bandit and neuraltangent kernel bandit, ROFU achieves (near-)optimal regret bounds with certainuncertainty measure, which theoretically justifies its effectiveness on EE trade-off.Importantly, ROFU admits a very efficient implementation with gradient-basedoptimizer, which easily extends to general deep neural network models beyondneural tangent kernel, in sharp contrast with previous OFU methods. The em-pirical evaluation demonstrates that ROFU works extremelywell for contextualbandits under various settings.
【7】 Online Estimation and Coverage Control with Heterogeneous Sensing Information 标题:异质传感信息的在线估计与覆盖控制
作者:Andrew McDonald,Lai Wei,Vaibhav Srivastava 机构: McDonald is with the Department of Computer Science and Engineer-ing, Michigan State University, Srivastava are with the Department of Electrical andComputer Engineering 备注:6 pages, 2 figures, accepted to IEEE CCTA'21 链接:https://arxiv.org/abs/2106.14984 摘要:异构多机器人传感系统比同构系统能够更全面地描述物理过程。对感官数据的多种形式的访问允许这样的系统在互补的来源之间融合信息,并学习对感兴趣的现象的更丰富的表示。通常,这些数据是相关的,但保真度不同,即精度(偏差)和精度(噪声)。低保真数据可能更丰富,而高保真数据可能更可信。本文通过结合低保真度和高保真度数据来学习和覆盖感兴趣的感官函数,来解决多机器人在线估计和覆盖控制问题。针对这一异构学习和覆盖任务,我们提出了两种算法,即多保真学习和覆盖的随机排序算法(SMLC)和多保真学习和覆盖的确定性排序算法(DMLC),并证明了它们的渐近收敛性。此外,我们还通过数值模拟验证了SMLC和DMLC的经验有效性。 摘要:Heterogeneous multi-robot sensing systems are able to characterize physical processes more comprehensively than homogeneous systems. Access to multiple modalities of sensory data allow such systems to fuse information between complementary sources and learn richer representations of a phenomenon of interest. Often, these data are correlated but vary in fidelity, i.e., accuracy (bias) and precision (noise). Low-fidelity data may be more plentiful, while high-fidelity data may be more trustworthy. In this paper, we address the problem of multi-robot online estimation and coverage control by combining low- and high-fidelity data to learn and cover a sensory function of interest. We propose two algorithms for this task of heterogeneous learning and coverage -- namely Stochastic Sequencing of Multi-fidelity Learning and Coverage (SMLC) and Deterministic Sequencing of Multi-fidelity Learning and Coverage (DMLC) -- and prove that they converge asymptotically. In addition, we demonstrate the empirical efficacy of SMLC and DMLC through numerical simulations.
【8】 Short-Term Load Forecasting for Smart HomeAppliances with Sequence to Sequence Learning 标题:基于逐次学习的智能家电短期负荷预测
作者:Mina Razghandi,Hao Zhou,Melike Erol-Kantarci,Damla Turgut 机构:†Department of Computer Science, University of Central Florida, ‡School of Electrical Engineering and Computer Science, University of Ottawa 备注:Accepted by 2021 IEEE International Conference on Communications (ICC), copyright belongs to IEEE 链接:https://arxiv.org/abs/2106.15348 摘要:设备级负荷预测在住宅能源管理中起着至关重要的作用,对公用事业提供的辅助服务也具有重要意义。在本文中,我们建议使用一个基于LSTM的序列到序列(seq2seq)学习模型来捕捉设备的负载情况。我们使用四栋住宅楼的真实数据集,将我们提出的方案与其他三种技术,即VARMA、扩展一维卷积神经网络和LSTM模型进行比较,结果表明,基于LSTM的seq2seq模型在大多数情况下的预测误差优于其他技术。 摘要:Appliance-level load forecasting plays a critical role in residential energy management, besides having significant importance for ancillary services performed by the utilities. In this paper, we propose to use an LSTM-based sequence-to-sequence (seq2seq) learning model that can capture the load profiles of appliances. We use a real dataset collected fromfour residential buildings and compare our proposed schemewith three other techniques, namely VARMA, Dilated One Dimensional Convolutional Neural Network, and an LSTM model.The results show that the proposed LSTM-based seq2seq model outperforms other techniques in terms of prediction error in most cases.
【9】 Machine learning for plant microRNA prediction: A systematic review 标题:机器学习在植物microRNA预测中的系统评价
作者:Shyaman Jayasundara,Sandali Lokuge,Puwasuru Ihalagedara,Damayanthi Herath 机构: Department of Computer Engineering, University of Peradeniya, Peradeniya , Sri Lanka 链接:https://arxiv.org/abs/2106.15159 摘要:MicroRNAs(miRNAs)是一类内源性小分子非编码rna,在转录后基因调控中发挥重要作用。然而,miRNA序列和结构的实验测定既昂贵又耗时。因此,基于计算和机器学习的方法被用来预测新的microRNAs。随着数据科学和机器学习在生物学中的应用,人们进行了多种研究,寻找具有不同计算方法和不同miRNA特征的microrna。详细讨论了多种方法,包括所用的学习算法、所考虑的特征、所用的数据集和评价标准。本文系统地综述了植物miRNA识别的机器学习方法。这将有助于研究人员对过去的研究有一个详细的了解,并找出解决过去研究中出现的缺点的新途径。我们的发现强调了植物特异性计算方法鉴定miRNA的必要性。 摘要:MicroRNAs (miRNAs) are endogenous small non-coding RNAs that play an important role in post-transcriptional gene regulation. However, the experimental determination of miRNA sequence and structure is both expensive and time-consuming. Therefore, computational and machine learning-based approaches have been adopted to predict novel microRNAs. With the involvement of data science and machine learning in biology, multiple research studies have been conducted to find microRNAs with different computational methods and different miRNA features. Multiple approaches are discussed in detail considering the learning algorithm/s used, features considered, dataset/s used and the criteria used in evaluations. This systematic review focuses on the machine learning methods developed for miRNA identification in plants. This will help researchers to gain a detailed idea about past studies and identify novel paths that solve drawbacks occurred in past studies. Our findings highlight the need for plant-specific computational methods for miRNA identification.
其他神经网络|深度学习|模型|建模(20篇)
【1】 Learning Task Informed Abstraction 标题:学习任务知情摘要
作者:Xiang Fu,Ge Yang,Pulkit Agrawal,Tommi Jaakkola 备注:8 pages, 12 figures 链接:https://arxiv.org/abs/2106.15612 摘要:现有的基于模型的强化学习方法在复杂的视觉场景中进行操作时,由于无法确定任务相关特征的优先级,因而存在一定的困难。为了缓解这个问题,我们提出学习任务通知抽象(TIA),明确区分奖励相关的视觉特征和分心。对于TIA的学习,我们引入了任务通知MDP(TiMDP)的形式化方法,该方法通过训练两个通过合作重建学习视觉特征的模型来实现,但其中一个模型与奖赏信号是敌对分离的。实验结果表明,在许多视觉控制任务中,TIA比最先进的方法有显著的性能提高,而这些任务中自然和无约束的视觉分心是一个巨大的挑战。 摘要:Current model-based reinforcement learning methods struggle when operating from complex visual scenes due to their inability to prioritize task-relevant features. To mitigate this problem, we propose learning Task Informed Abstractions (TIA) that explicitly separates reward-correlated visual features from distractors. For learning TIA, we introduce the formalism of Task Informed MDP (TiMDP) that is realized by training two models that learn visual features via cooperative reconstruction, but one model is adversarially dissociated from the reward signal. Empirical evaluation shows that TIA leads to significant performance gains over state-of-the-art methods on many visual control tasks where natural and unconstrained visual distractions pose a formidable challenge.
【2】 The Values Encoded in Machine Learning Research 标题:机器学习研究中的编码价值
作者:Abeba Birhane,Pratyusha Kalluri,Dallas Card,William Agnew,Ravit Dotan,Michelle Bao 机构:University College Dublin & Lero, Dublin, Ireland, Stanford University, University of Washington, University of California, Berkeley 备注:Data and code available at this https URL 链接:https://arxiv.org/abs/2106.15590 摘要:机器学习(ML)目前对世界产生了巨大的影响,越来越多地影响到社区和机构实践。因此,至关重要的是,我们质疑该领域作为价值中立或普遍有益的模糊概念,并调查该领域正在推进的具体价值。在本文中,我们提出了一个严格的审查价值领域的定量和定性分析100高引用ML论文发表在总理ML会议,ICML和神经。我们注释了论文的关键特征,这些特征揭示了他们的价值:他们如何证明他们选择项目的合理性,他们提升了哪些方面,他们对潜在负面后果的考虑,以及他们的机构联系和资金来源。我们发现,社会需求与项目选择之间的联系通常非常松散(如果有提及的话),而且考虑负面后果的情况极为罕见。我们确定了在机器学习研究中被提升的67个价值观,其中,我们发现论文最经常基于表现、概括、效率、研究者理解、新颖性和基于先前工作的基础来证明和评估自己。我们提供了大量的文本证据,并分析了这些价值观是如何运作的。值得注意的是,我们发现,这些最高价值观中的每一个目前都在定义和应用,其假设和含义通常支持权力集中。最后,我们发现这些被高度引用的论文与科技公司和精英大学之间的联系越来越密切。 摘要:Machine learning (ML) currently exerts an outsized influence on the world, increasingly affecting communities and institutional practices. It is therefore critical that we question vague conceptions of the field as value-neutral or universally beneficial, and investigate what specific values the field is advancing. In this paper, we present a rigorous examination of the values of the field by quantitatively and qualitatively analyzing 100 highly cited ML papers published at premier ML conferences, ICML and NeurIPS. We annotate key features of papers which reveal their values: how they justify their choice of project, which aspects they uplift, their consideration of potential negative consequences, and their institutional affiliations and funding sources. We find that societal needs are typically very loosely connected to the choice of project, if mentioned at all, and that consideration of negative consequences is extremely rare. We identify 67 values that are uplifted in machine learning research, and, of these, we find that papers most frequently justify and assess themselves based on performance, generalization, efficiency, researcher understanding, novelty, and building on previous work. We present extensive textual evidence and analysis of how these values are operationalized. Notably, we find that each of these top values is currently being defined and applied with assumptions and implications generally supporting the centralization of power. Finally, we find increasingly close ties between these highly cited papers and tech companies and elite universities.
【3】 Curious Explorer: a provable exploration strategy in Policy Learning 标题:好奇的探索者:政策学习中的一种可证明的探索策略
作者:Marco Miani,Maurizio Parton,Marco Romito 机构:University of Pisa, University of Chieti-Pescara 链接:https://arxiv.org/abs/2106.15503 摘要:对于策略梯度方法来说,访问一个具有探索性的重新启动分布(所谓的广覆盖假设)是至关重要的。这是因为,虽然目标函数对不太可能出现的状态的更新不敏感,但是代理仍然需要在这些状态中进行改进,以达到接近最优的收益。因此,在分析实际政策梯度方法的理论性质时,以某种形式使用了宽覆盖。然而,这种假设在某些环境中是不可行的,例如在线学习时,或者只能从固定的初始状态重新启动时。在这种情况下,经典的策略梯度算法的收敛性和样本效率都很差。在本文中,我们开发了好奇探索者,一种新颖而简单的迭代状态空间探索策略,可用于任何起始分布$\rho$。好奇资源管理器从$\rho$开始,然后使用分配给访问不佳的状态集的内在奖励生成一系列策略,每一个策略都比前一个策略更具探索性,并且以知情的方式,最后根据探索性策略的状态访问分布输出一个重新启动模型$\mu$。好奇的探索者是可以证明的,从这个意义上说,我们提供了一个最佳策略访问访问访问量很低的州的频率的理论上限。当PAC优化器插入浏览器时,这些边界可以用来证明PAC收敛性和样本效率结果。这使得在没有任何覆盖假设的情况下,可以实现全局收敛和样本效率结果,并且对于任何其他策略梯度方法都有可能确保PAC在大覆盖范围内的收敛。最后,我们将好奇资源管理器的输出插入到REINFORCE和TRPO中,并通过实验证明它可以提高MDPs中具有挑战性的探索的性能。 摘要:Having access to an exploring restart distribution (the so-called wide coverage assumption) is critical with policy gradient methods. This is due to the fact that, while the objective function is insensitive to updates in unlikely states, the agent may still need improvements in those states in order to reach a nearly optimal payoff. For this reason, wide coverage is used in some form when analyzing theoretical properties of practical policy gradient methods. However, this assumption can be unfeasible in certain environments, for instance when learning is online, or when restarts are possible only from a fixed initial state. In these cases, classical policy gradient algorithms can have very poor convergence properties and sample efficiency. In this paper, we develop Curious Explorer, a novel and simple iterative state space exploration strategy that can be used with any starting distribution $\rho$. Curious Explorer starts from $\rho$, then using intrinsic rewards assigned to the set of poorly visited states produces a sequence of policies, each one more exploratory than the previous one in an informed way, and finally outputs a restart model $\mu$ based on the state visitation distribution of the exploratory policies. Curious Explorer is provable, in the sense that we provide theoretical upper bounds on how often an optimal policy visits poorly visited states. These bounds can be used to prove PAC convergence and sample efficiency results when a PAC optimizer is plugged in Curious Explorer. This allows to achieve global convergence and sample efficiency results without any coverage assumption for REINFORCE, and potentially for any other policy gradient method ensuring PAC convergence with wide coverage. Finally, we plug (the output of) Curious Explorer into REINFORCE and TRPO, and show empirically that it can improve performance in MDPs with challenging exploration.
【4】 A Convergent and Efficient Deep Q Network Algorithm 标题:一种收敛高效的深度Q网络算法
作者:Zhikang T. Wang,Masahito Ueda 机构:Department of Physics and, Institute for Physics of Intelligence, University of Tokyo, RIKEN Center for Emergent Matter Science (CEMS) 链接:https://arxiv.org/abs/2106.15419 摘要:尽管deepq网络(DQN)强化学习算法及其变体在经验上取得了成功,但DQN仍然没有得到很好的理解,也不能保证其收敛性。在这项工作中,我们表明,DQN可以发散,并停止在现实环境中运行。尽管存在基于梯度的收敛方法,但我们证明了它们在学习行为上存在固有的问题,并阐明了它们在实践中经常失败的原因。为了克服这些问题,我们通过对DQN算法的仔细修改,提出了一种收敛的DQN算法(C-DQN),并证明了该算法是收敛的,并且可以处理较大的折扣因子(0.9998)。它在困难的环境中学习能力很强,并且可以在Atari 2600基准测试中学习一些DQN失败的困难游戏,并且计算预算适中。我们的代码已经公开发布,可以用来复制我们的结果。 摘要:Despite the empirical success of the deep Q network (DQN) reinforcement learning algorithm and its variants, DQN is still not well understood and it does not guarantee convergence. In this work, we show that DQN can diverge and cease to operate in realistic settings. Although there exist gradient-based convergent methods, we show that they actually have inherent problems in learning behaviour and elucidate why they often fail in practice. To overcome these problems, we propose a convergent DQN algorithm (C-DQN) by carefully modifying DQN, and we show that the algorithm is convergent and can work with large discount factors (0.9998). It learns robustly in difficult settings and can learn several difficult games in the Atari 2600 benchmark where DQN fail, within a moderate computational budget. Our codes have been publicly released and can be used to reproduce our results.
【5】 Automated Evolutionary Approach for the Design of Composite Machine Learning Pipelines 标题:组合机器学习流水线设计的自动进化方法
作者:Nikolay O. Nikitin,Pavel Vychuzhanin,Mikhail Sarafanov,Iana S. Polonskaia,Ilia Revin,Irina V. Barabanova,Gleb Maximov,Anna V. Kalyuzhnaya,Alexander Boukhanovsky 机构:ITMO University, Saint-Petersburg, Russia 链接:https://arxiv.org/abs/2106.15397 摘要:机器学习方法在实际任务中的有效性取决于建模管道的适当结构。该方法旨在实现复合机器学习流水线的自动化设计,相当于由模型和数据操作组成的计算工作流。该方法结合了自动机器学习和工作流管理系统的关键思想。它采用可定制的基于图形的结构设计管道,分析得到的结果,并再现它们。采用进化方法对管道结构进行柔性识别。此外,还实现了灵敏度分析、原子化和超参数调整等算法,以提高该方法的有效性。同时,基于这种方法的软件实现也以开源框架的形式呈现。这组实验是针对不同的数据集和任务(分类、回归、时间序列预测)进行的。通过与现有竞争对手和基线解的比较,验证了该方法的正确性和有效性。 摘要:The effectiveness of the machine learning methods for real-world tasks depends on the proper structure of the modeling pipeline. The proposed approach is aimed to automate the design of composite machine learning pipelines, which is equivalent to computation workflows that consist of models and data operations. The approach combines key ideas of both automated machine learning and workflow management systems. It designs the pipelines with a customizable graph-based structure, analyzes the obtained results, and reproduces them. The evolutionary approach is used for the flexible identification of pipeline structure. The additional algorithms for sensitivity analysis, atomization, and hyperparameter tuning are implemented to improve the effectiveness of the approach. Also, the software implementation on this approach is presented as an open-source framework. The set of experiments is conducted for the different datasets and tasks (classification, regression, time series forecasting). The obtained results confirm the correctness and effectiveness of the proposed approach in the comparison with the state-of-the-art competitors and baseline solutions.
【6】 LB-CNN: An Open Source Framework for Fast Training of Light Binary Convolutional Neural Networks using Chainer and Cupy 标题:LB-CNN:利用Chainer和Cupy快速训练轻型二进制卷积神经网络的开源框架
作者:Radu Dogaru,Ioana Dogaru 机构:Dept. of Applied and Information Engineering, University “Politehnica” of Bucharest, Bucharest, Romania 备注:6 pages, includes reference to code (Jupyter - Python notebook) 链接:https://arxiv.org/abs/2106.15350 摘要:轻二进制卷积神经网络(LB-CNN)在许多工业应用中需要在低能耗计算平台上实现时特别有用。本文介绍了一种优化紧凑LB-CNN的框架,并对其有效性进行了评价。该框架是免费提供的,可以在免费访问的云平台上运行,因此不需要重大投资。优化后的模型以标准化的.h5格式保存,可以作为专用工具的输入,进一步部署到特定技术中,从而实现各种智能图像传感器的快速发展。加速我们模型优化的主要因素,特别是二进制卷积核的选择,是Chainer/Cupy机器学习库,它为将输出层训练成一个极限学习机提供了显著的加速。包括使用Keras/Tensorflow对输出层进行额外的训练,因为这样可以提高精度。对于广泛使用的数据集,包括MNIST、GTSRB、ORL和VGG,结果显示在精确度和复杂性之间有很好的折衷。特别是,对于人脸识别问题,经过仔细优化的LB-CNN模型提供了高达100%的准确率。这种TinyML解决方案非常适合需要低能耗图像识别的工业应用。 摘要:Light binary convolutional neural networks (LB-CNN) are particularly useful when implemented in low-energy computing platforms as required in many industrial applications. Herein, a framework for optimizing compact LB-CNN is introduced and its effectiveness is evaluated. The framework is freely available and may run on free-access cloud platforms, thus requiring no major investments. The optimized model is saved in the standardized .h5 format and can be used as input to specialized tools for further deployment into specific technologies, thus enabling the rapid development of various intelligent image sensors. The main ingredient in accelerating the optimization of our model, particularly the selection of binary convolution kernels, is the Chainer/Cupy machine learning library offering significant speed-ups for training the output layer as an extreme-learning machine. Additional training of the output layer using Keras/Tensorflow is included, as it allows an increase in accuracy. Results for widely used datasets including MNIST, GTSRB, ORL, VGG show very good compromise between accuracy and complexity. Particularly, for face recognition problems a carefully optimized LB-CNN model provides up to 100% accuracies. Such TinyML solutions are well suited for industrial applications requiring image recognition with low energy consumption.
【7】 Differential Privacy for Credit Risk Model 标题:信用风险模型的差分隐私性
作者:Tabish Maniar,Alekhya Akkinepally,Anantha Sharma 机构:Synechron Innovation Lab 备注:7 pages, 3 figures, 2 tables 链接:https://arxiv.org/abs/2106.15343 摘要:使用机器学习算法来模拟用户行为和驱动业务决策已经变得越来越普遍,特别是为自动化决策提供智能建议。这导致越来越多地使用客户的个人数据来分析客户行为和预测他们对公司产品的兴趣。越来越多地使用这些客户个人数据可以带来更好的模型,但也可能导致客户数据被泄露、逆向工程和错误处理。在本文中,我们评估差异隐私作为解决这些隐私问题的解决方案,将隐私保护纳入预测模型开发的数据工程和模型训练阶段。我们感兴趣的是在操作环境中的实用实现,这需要一个通用的差异私有建模框架,我们评估了LeapYear中应用于信用风险建模领域的一个工具。信用风险模型是银行业和金融业的一种主要建模方法,通过分析用户数据来确定银行的总预期损失。我们研究了差别隐私权在信用风险模型中的应用,并评估了差别隐私模型和非差别隐私模型的性能。信用风险模型是银行业和金融业的一种主要建模方法,通过分析用户数据来确定银行的总预期损失。本文探讨了差分隐私权在信用风险模型中的应用,并用差分隐私模型对一个非差分隐私模型的性能进行了评价。 摘要:The use of machine learning algorithms to model user behavior and drive business decisions has become increasingly commonplace, specifically providing intelligent recommendations to automated decision making. This has led to an increase in the use of customers personal data to analyze customer behavior and predict their interests in a companys products. Increased use of this customer personal data can lead to better models but also to the potential of customer data being leaked, reverse engineered, and mishandled. In this paper, we assess differential privacy as a solution to address these privacy problems by building privacy protections into the data engineering and model training stages of predictive model development. Our interest is a pragmatic implementation in an operational environment, which necessitates a general purpose differentially private modeling framework, and we evaluate one such tool from LeapYear as applied to the Credit Risk modeling domain. Credit Risk Model is a major modeling methodology in banking and finance where user data is analyzed to determine the total Expected Loss to the bank. We examine the application of differential privacy on the credit risk model and evaluate the performance of a Differentially Private Model with a Non Differentially Private Model. Credit Risk Model is a major modeling methodology in banking and finance where users data is analyzed to determine the total Expected Loss to the bank. In this paper, we explore the application of differential privacy on the credit risk model and evaluate the performance of a Non Differentially Private Model with Differentially Private Model.
【8】 Joint Learning of Portrait Intrinsic Decomposition and Relighting 标题:人像本征分解与重光照的联合学习
作者:Mona Zehni,Shaona Ghosh,Krishna Sridhar,Sethu Raman 机构:Department of ECE and CSL, University of Illinois at Urbana-Champaign, Apple Inc. 链接:https://arxiv.org/abs/2106.15305 摘要:逆渲染是将图像分解为其固有成分的问题,即反照率、法线和光照。为了解决单幅图像的不适定问题,现有的从阴影到形状的方法大多是对合成或真实数据集上的所有分量进行有监督的训练。在这里,我们提出了一种新的自我监督训练范式,即1)减少了对分解任务的全面监督,2)考虑了重照明任务。我们引入了新的自监督损失项,利用多照明图像(不同照明下相同场景的图像)之间的一致性。我们的方法适用于多光源数据集。我们在两种情况下应用我们的训练方法:1)在合成和真实数据的混合上训练,2)在有限的监督下在真实数据集上训练。我们展示了我们的训练范式在内在分解和重照明两方面的有效性,并展示了在有限的监督设置下,模型如何在没有自我监督损失项的情况下在两个任务中挣扎。我们提供了在SfSNet、CelebA和Photoface数据集上的综合实验结果,并在野外图像上验证了我们的方法的性能。 摘要:Inverse rendering is the problem of decomposing an image into its intrinsic components, i.e. albedo, normal and lighting. To solve this ill-posed problem from single image, state-of-the-art methods in shape from shading mostly resort to supervised training on all the components on either synthetic or real datasets. Here, we propose a new self-supervised training paradigm that 1) reduces the need for full supervision on the decomposition task and 2) takes into account the relighting task. We introduce new self-supervised loss terms that leverage the consistencies between multi-lit images (images of the same scene under different illuminations). Our approach is applicable to multi-lit datasets. We apply our training approach in two settings: 1) train on a mixture of synthetic and real data, 2) train on real datasets with limited supervision. We show-case the effectiveness of our training paradigm on both intrinsic decomposition and relighting and demonstrate how the model struggles in both tasks without the self-supervised loss terms in limited supervision settings. We provide results of comprehensive experiments on SfSNet, CelebA and Photoface datasets and verify the performance of our approach on images in the wild.
【9】 VolterraNet: A higher order convolutional network with group equivariance for homogeneous manifolds 标题:VolterraNet:齐次流形的一种高阶群等方差卷积网络
作者:Monami Banerjee,Rudrasis Chakraborty,Jose Bouza,Baba C. Vemuri 机构: Chakraborty is with University ofCalifornia, Vemuri are with University of Florida 备注:IEEE Transactions on Pattern Analysis and Machine Intelligence (2020) 链接:https://arxiv.org/abs/2106.15301 摘要:卷积神经网络由于其平移等价性,在基于图像的学习任务中取得了很大的成功。最近的工作将卷积神经网络的传统卷积层推广到非欧氏空间,并证明了广义卷积运算的群等价性。本文针对黎曼齐次空间上定义为函数样本的数据,提出了一种新的高阶Volterra卷积神经网络。通过对传统卷积结果的分析,我们证明了Volterra函数卷积与黎曼齐次空间所承认的等距群的作用是等价的,并且在一定的限制条件下,任何非线性等变函数都可以表示为我们的齐次空间Volterra卷积,推广了欧氏空间Volterra展开式的非线性位移等变特征。我们还证明了二阶函数卷积运算可以表示为级联卷积运算,从而得到一个有效的实现。除此之外,我们还提出了一个扩展的volternet模型。这些进展导致相对于基线非欧几里德CNN的参数大幅度降低。为了证明volternet性能的有效性,我们提供了几个实际数据实验,包括球形MNIST、原子能、Shrec17数据集的分类任务,以及扩散MRI数据的分组测试。性能比较的国家最先进的也提出了。 摘要:Convolutional neural networks have been highly successful in image-based learning tasks due to their translation equivariance property. Recent work has generalized the traditional convolutional layer of a convolutional neural network to non-Euclidean spaces and shown group equivariance of the generalized convolution operation. In this paper, we present a novel higher order Volterra convolutional neural network (VolterraNet) for data defined as samples of functions on Riemannian homogeneous spaces. Analagous to the result for traditional convolutions, we prove that the Volterra functional convolutions are equivariant to the action of the isometry group admitted by the Riemannian homogeneous spaces, and under some restrictions, any non-linear equivariant function can be expressed as our homogeneous space Volterra convolution, generalizing the non-linear shift equivariant characterization of Volterra expansions in Euclidean space. We also prove that second order functional convolution operations can be represented as cascaded convolutions which leads to an efficient implementation. Beyond this, we also propose a dilated VolterraNet model. These advances lead to large parameter reductions relative to baseline non-Euclidean CNNs. To demonstrate the efficacy of the VolterraNet performance, we present several real data experiments involving classification tasks on spherical-MNIST, atomic energy, Shrec17 data sets, and group testing on diffusion MRI data. Performance comparisons to the state-of-the-art are also presented.
【10】 Evaluating Deep Neural Networks for Image Document Enhancement 标题:深度神经网络在图像文档增强中的评价
作者:Lucas N. Kirsten,Ricardo Piccoli,Ricardo Ribani 机构:Department of Print Software, HP Inc. – R&D, Porto Alegre – RS,-, Brazil 备注:12 pages, 6 figures, 2 tables, CBDAR conference 链接:https://arxiv.org/abs/2106.15286 摘要:这项工作评估了六个国家的最先进的深层神经网络(DNN)架构应用于增强相机捕获的文件图像的问题。利用图像质量评价(IQA)指标对每个网络的结果进行了定性和定量评价,并与现有的基于传统计算机视觉技术的方法进行了比较。与现有算法相比,性能最好的体系结构通常产生了良好的增强效果,这表明使用DNNs进行文档图像增强是可能的。此外,性能最好的体系结构可以作为未来使用深度学习技术进行文档增强研究的基线。本文的主要贡献是:一个可以进一步改进以提供更好结果的深度学习技术的基线,以及一个使用IQA度量来定量比较从神经网络产生的图像与地面真实值的评估方法。 摘要:This work evaluates six state-of-the-art deep neural network (DNN) architectures applied to the problem of enhancing camera-captured document images. The results from each network were evaluated both qualitatively and quantitatively using Image Quality Assessment (IQA) metrics, and also compared with an existing approach based on traditional computer vision techniques. The best performing architectures generally produced good enhancement compared to the existing algorithm, showing that it is possible to use DNNs for document image enhancement. Furthermore, the best performing architectures could work as a baseline for future investigations on document enhancement using deep learning techniques. The main contributions of this paper are: a baseline of deep learning techniques that can be further improved to provide better results, and a evaluation methodology using IQA metrics for quantitatively comparing the produced images from the neural networks to a ground truth.
【11】 INN: A Method Identifying Clean-annotated Samples via Consistency Effect in Deep Neural Networks 标题:INN:一种基于一致性效应的深度神经网络清洁标注样本识别方法
作者:Dongha Kim,Yongchan Choi,Kunwoong Kim,Yongdai Kim 机构:Sungsin Women’s University, Seoul National University, Department of Statistics and, Graduate School of Data Science 备注:17 pages, 9 figures 链接:https://arxiv.org/abs/2106.15185 摘要:在许多分类问题中,收集大量干净的带注释的数据是不容易的,因此人们对带噪声标签的数据进行了大量的研究。最新的有噪声标签问题的解决方案是建立在利用记忆效应的小损失策略上的。虽然它是一个强大的工具,记忆效果有几个缺点。学习成绩对利用记忆效应所需的训练时间的选择非常敏感。此外,当标签被严重污染或不平衡时,记忆效应可能不会发生,在这种情况下,基于小损失策略的方法无法识别干净的标签数据。我们引入了一种新的方法INN(integrationwiththenearhorithmearchoods)从带有噪声标签的训练数据中提取干净的标签数据。该方法基于一个新的发现,即干净标记数据的邻域预测模式与噪声标记数据的邻域预测模式不受训练时间的影响。INN方法需要更多的计算量,但比小损失策略更稳定和强大。通过各种实验,我们证明INN方法成功地解决了记忆效果的不足,有助于用带噪声标签的训练数据建立更精确的深度预测模型。 摘要:In many classification problems, collecting massive clean-annotated data is not easy, and thus a lot of researches have been done to handle data with noisy labels. Most recent state-of-art solutions for noisy label problems are built on the small-loss strategy which exploits the memorization effect. While it is a powerful tool, the memorization effect has several drawbacks. The performances are sensitive to the choice of a training epoch required for utilizing the memorization effect. In addition, when the labels are heavily contaminated or imbalanced, the memorization effect may not occur in which case the methods based on the small-loss strategy fail to identify clean labeled data. We introduce a new method called INN(Integration with the Nearest Neighborhoods) to refine clean labeled data from training data with noisy labels. The proposed method is based on a new discovery that a prediction pattern at neighbor regions of clean labeled data is consistently different from that of noisy labeled data regardless of training epochs. The INN method requires more computation but is much stable and powerful than the small-loss strategy. By carrying out various experiments, we demonstrate that the INN method resolves the shortcomings in the memorization effect successfully and thus is helpful to construct more accurate deep prediction models with training data with noisy labels.
【12】 Towards Generalisable Deep Inertial Tracking via Geometry-Aware Learning 标题:基于几何意识学习的通用型深惯性跟踪
作者:Mohammed Alloulah,Maximilian Arnold,Anton Isopoussu 机构:†Bell Labs, ‡Invenia Labs 备注:Draft 链接:https://arxiv.org/abs/2106.15178 摘要:无仪器和无准备环境下的自主导航是下一代室内外位置服务的基本需求。为了实现这种雄心壮志,需要一套协作的传感模式,以便在不受挑战的动态条件影响的情况下保持性能。在现有的许多模式中,惯性跟踪由于其独立于周围环境,在暂时不利的操作条件下发挥着关键作用。然而,惯性跟踪传统上(i)遭受过多的误差增长和(ii)需要广泛和繁琐的调整。这两个问题限制了惯性跟踪的吸引力和实用性。本文提出了一种新的深度学习惯性跟踪系统DIT,它克服了以往的局限性;也就是说,通过(i)显著减少跟踪漂移和(ii)无缝地构建鲁棒和可推广的学习模型。DIT描述了两个核心贡献:(i)DIT采用了一个机械滑块子系统增强的机器人平台,该子系统可自动对不同传感器安装几何形状产生的惯性信号变量进行采样。我们利用该平台在内部策划了720万个样本数据集,覆盖21公里的总距离,分为11个索引传感器安装几何体(ii)DIT使用深度学习、最佳传输和域自适应(DA)来创建一个对传感器安装几何结构中的变化具有鲁棒性的模型。整个系统以端到端的机器人学习方式综合高性能和通用的惯性导航模型。在我们的评估中,DIT在性能上优于工业级传感器融合基线10倍(第90百分位),在训练时间上优于最先进的对抗性DA技术2.5倍(第90百分位)和10倍以上。 摘要:Autonomous navigation in uninstrumented and unprepared environments is a fundamental demand for next generation indoor and outdoor location-based services. To bring about such ambition, a suite of collaborative sensing modalities is required in order to sustain performance irrespective of challenging dynamic conditions. Of the many modalities on offer, inertial tracking plays a key role under momentary unfavourable operational conditions owing to its independence of the surrounding environment. However, inertial tracking has traditionally (i) suffered from excessive error growth and (ii) required extensive and cumbersome tuning. Both of these issues have limited the appeal and utility of inertial tracking. In this paper, we present DIT: a novel Deep learning Inertial Tracking system that overcomes prior limitations; namely, by (i) significantly reducing tracking drift and (ii) seamlessly constructing robust and generalisable learned models. DIT describes two core contributions: (i) DIT employs a robotic platform augmented with a mechanical slider subsystem that automatically samples inertial signal variabilities arising from different sensor mounting geometries. We use the platform to curate in-house a 7.2 million sample dataset covering an aggregate distance of 21 kilometres split into 11 indexed sensor mounting geometries. (ii) DIT uses deep learning, optimal transport, and domain adaptation (DA) to create a model which is robust to variabilities in sensor mounting geometry. The overall system synthesises high-performance and generalisable inertial navigation models in an end-to-end, robotic-learning fashion. In our evaluation, DIT outperforms an industrial-grade sensor fusion baseline by 10x (90th percentile) and a state-of-the-art adversarial DA technique by > 2.5x in performance (90th percentile) and >10x in training time.
【13】 Learning from Multiple Annotators by Incorporating Instance Features 标题:通过合并实例特征向多个注释器学习
作者:Jingzheng Li,Hailong Sun,Jiyi Li,Zhijun Chen,Renshuai Tao,Yufei Ge 机构:Beihang University, University of Yamanashi, Northeast Normal University 链接:https://arxiv.org/abs/2106.15146 摘要:从多个注释者学习的目的是从训练实例中归纳出一个高质量的分类器,其中每个注释者在其不同能力和自身偏见的影响下与多个注释者提供的一组可能有噪声的标签相关联。在对潜在真实标签到观测标签的概率转换过程进行建模时,现有的方法大多采用注释器的类级混淆矩阵,使得观测标签不依赖于实例特征,只依赖于真实标签。这可能会限制分类器所能达到的性能。在这项工作中,我们提出了噪声转移矩阵,它结合了实例特征对基于混淆矩阵的注释器性能的影响。此外,我们提出了一个简单而有效的学习框架,在统一的神经网络结构中由分类器模块和噪声转移矩阵模块组成。实验结果表明,与现有方法相比,该方法具有优越性。 摘要:Learning from multiple annotators aims to induce a high-quality classifier from training instances, where each of them is associated with a set of possibly noisy labels provided by multiple annotators under the influence of their varying abilities and own biases. In modeling the probability transition process from latent true labels to observed labels, most existing methods adopt class-level confusion matrices of annotators that observed labels do not depend on the instance features, just determined by the true labels. It may limit the performance that the classifier can achieve. In this work, we propose the noise transition matrix, which incorporates the influence of instance features on annotators' performance based on confusion matrices. Furthermore, we propose a simple yet effective learning framework, which consists of a classifier module and a noise transition matrix module in a unified neural network architecture. Experimental results demonstrate the superiority of our method in comparison with state-of-the-art methods.
【14】 Certifiable Machine Unlearning for Linear Models 标题:线性模型的可证明机器遗忘
作者:Ananth Mahadevan,Michael Mathioudakis 机构:University of Helsinki, Helsinki, Finland 链接:https://arxiv.org/abs/2106.15093 摘要:机器学习是在删除训练数据子集后更新机器学习模型的任务。该任务的方法需要将有效性和效率结合起来,也就是说,它们应该有效地“忘却”删除的数据,但是对于少量的删除不需要过多的计算工作(例如,完全的再训练)。这种组合通常是通过在忘却学习中容忍一些近似值来实现的。此外,本着“被遗忘的权利”精神的法律法规也提出了可证明性的要求,即证明删除的数据确实已被ML模型取消学习的能力。在本文中,我们提出了一个实验研究的三个国家的最先进的近似学习方法的线性模型,并证明了折衷之间的效率,有效性和可认证性所提供的每一种方法。在实施这项研究的过程中,我们扩展了一些现有的工作,并描述了一个通用的ML管道来比较和评估在六个真实世界数据集和各种设置上的取消学习方法。我们深入了解了删除数据的数量和分布对ML模型的影响,以及在不同设置下每个取消学习方法的性能。我们还提出了一个实用的在线策略来确定近似遗忘的累积误差何时足够大,以保证ML模型的完全再训练。 摘要:Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency, i.e., they should effectively "unlearn" deleted data, but in a way that does not require excessive computation effort (e.g., a full retraining) for a small amount of deletions. Such a combination is typically achieved by tolerating some amount of approximation in the unlearning. In addition, laws and regulations in the spirit of "the right to be forgotten" have given rise to requirements for certifiability, i.e., the ability to demonstrate that the deleted data has indeed been unlearned by the ML model. In this paper, we present an experimental study of the three state-of-the-art approximate unlearning methods for linear models and demonstrate the trade-offs between efficiency, effectiveness and certifiability offered by each method. In implementing the study, we extend some of the existing works and describe a common ML pipeline to compare and evaluate the unlearning methods on six real-world datasets and a variety of settings. We provide insights into the effect of the quantity and distribution of the deleted data on ML models and the performance of each unlearning method in different settings. We also propose a practical online strategy to determine when the accumulated error from approximate unlearning is large enough to warrant a full retrain of the ML model.
【15】 Unified Framework for Spectral Dimensionality Reduction, Maximum Variance Unfolding, and Kernel Learning By Semidefinite Programming: Tutorial and Survey 标题:用半定规划进行频谱降维、最大方差展开和核学习的统一框架:教程和综述
作者:Benyamin Ghojogh,Ali Ghodsi,Fakhri Karray,Mark Crowley 机构:Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada, Department of Statistics and Actuarial Science & David R. Cheriton School of Computer Science 备注:To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning 链接:https://arxiv.org/abs/2106.15379 摘要:这是一篇关于谱降维方法、半定规划(SDP)、最大方差展开(MVU)或半定嵌入(SDE)的核学习及其变体的统一的教程和综述。我们首先解释了如何将谱降维方法统一为具有不同核的核主成分分析(PCA)。这种统一可以解释为特征函数学习或用距离矩阵表示核。然后,由于谱方法统一为核主成分分析,我们说让我们学习最好的核展开流形的数据,其最大方差。本文首先简要介绍了用SDP进行核学习的方法。然后,我们详细介绍了MVU。介绍了利用最近邻图、类展开、Fisher准则和有色MVU实现有监督MVU的各种方法。我们还利用特征函数和核映射解释了MVU的样本外扩展。最后,我们介绍了MVU的其他变体,包括基于动作的嵌入、放松MVU和用于大数据的landmark MVU。 摘要:This is a tutorial and survey paper on unification of spectral dimensionality reduction methods, kernel learning by Semidefinite Programming (SDP), Maximum Variance Unfolding (MVU) or Semidefinite Embedding (SDE), and its variants. We first explain how the spectral dimensionality reduction methods can be unified as kernel Principal Component Analysis (PCA) with different kernels. This unification can be interpreted as eigenfunction learning or representation of kernel in terms of distance matrix. Then, since the spectral methods are unified as kernel PCA, we say let us learn the best kernel for unfolding the manifold of data to its maximum variance. We first briefly introduce kernel learning by SDP for the transduction task. Then, we explain MVU in detail. Various versions of supervised MVU using nearest neighbors graph, by class-wise unfolding, by Fisher criterion, and by colored MVU are explained. We also explain out-of-sample extension of MVU using eigenfunctions and kernel mapping. Finally, we introduce other variants of MVU including action respecting embedding, relaxed MVU, and landmark MVU for big data.
【16】 Learning complex dependency structure of gene regulatory networks from high dimensional micro-array data with Gaussian Bayesian networks 标题:利用高斯贝叶斯网络从高维微阵列数据中学习基因调控网络的复杂依赖结构
作者:Catharina Elisabeth Graafland,José Manuel Gutiérrez 机构: CSIC–Universidad de Cantabria 备注:20 pages, 5 figures 链接:https://arxiv.org/abs/2106.15365 摘要:基因表达数据集由上千个样本相对较小的基因组成(即大-p$-小-n$)。此外,数据集中还存在着不同阶次的依赖关系。在无向概率图形模型(UGM)框架下,提出了Glasso算法来处理高维微阵列数据集的稀疏性。同时,对默认Glasso算法进行了改进,以克服复杂的交互结构问题。在这项工作中,我们提倡使用一个简单的基于分数的爬山算法(HC),学习高斯贝叶斯网络(BNs)依赖于有向无环图(DAGs)。我们比较了HC与Glasso及其在UGM框架中的修饰对从属于大肠杆菌基因组的微阵列数据重建GRNs的能力。我们受益于联合概率密度(JPD)函数的分析性质,有向和无向pgm都建立在该函数上,用于将dag转换为ugm。我们的结论是,复杂数据中的依赖关系最好是通过HC算法来学习的,最准确、最有效地表示它们,同时对基因表达数据集中共存的强局部和弱但重要的全局连接进行建模。HC算法本质上适应于数据集的复杂依赖结构,而不需要预先强制特定的结构。相反,Glasso和modifications以牺牲网络中的概率信息和JPD函数中的结构偏差为代价建立了不必要的依赖模型,而JPD函数中的结构偏差只能包括许多参数。 摘要:Gene expression datasets consist of thousand of genes with relatively small samplesizes (i.e. are large-$p$-small-$n$). Moreover, dependencies of various orders co-exist in the datasets. In the Undirected probabilistic Graphical Model (UGM) framework the Glasso algorithm has been proposed to deal with high dimensional micro-array datasets forcing sparsity. Also, modifications of the default Glasso algorithm are developed to overcome the problem of complex interaction structure. In this work we advocate the use of a simple score-based Hill Climbing algorithm (HC) that learns Gaussian Bayesian Networks (BNs) leaning on Directed Acyclic Graphs (DAGs). We compare HC with Glasso and its modifications in the UGM framework on their capability to reconstruct GRNs from micro-array data belonging to the Escherichia Coli genome. We benefit from the analytical properties of the Joint Probability Density (JPD) function on which both directed and undirected PGMs build to convert DAGs to UGMs. We conclude that dependencies in complex data are learned best by the HC algorithm, presenting them most accurately and efficiently, simultaneously modelling strong local and weaker but significant global connections coexisting in the gene expression dataset. The HC algorithm adapts intrinsically to the complex dependency structure of the dataset, without forcing a specific structure in advance. On the contrary, Glasso and modifications model unnecessary dependencies at the expense of the probabilistic information in the network and of a structural bias in the JPD function that can only be relieved including many parameters.
【17】 FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis 标题:FastPitchFormant:基于信源-过滤的语音合成分解建模
作者:Taejun Bak,Jae-Sung Bae,Hanbin Bae,Young-Ik Kim,Hoon-Young Cho 机构:Speech AI Lab, NCSOFT, Republic of Korea 备注:Accepted to INTERSPEECH 2021 链接:https://arxiv.org/abs/2106.15123 摘要:针对神经文本语音转换(TTS)模型,提出了基于声学特征的韵律建模与控制方法。韵律语音可以通过调节声学特征来产生。然而,基音偏移量大的合成语音存在音质下降、说话人特征变形等问题。针对这一问题,本文提出了一种基于源滤波器理论设计的基于前馈Transformer的TTS模型。这个模型被称为FastPitch共振峰,它有一个独特的结构,可以并行处理文本和声学特征。通过对每个特征分别建模,可以缓解模型学习两个特征之间关系的趋势。 摘要:Methods for modeling and controlling prosody with acoustic features have been proposed for neural text-to-speech (TTS) models. Prosodic speech can be generated by conditioning acoustic features. However, synthesized speech with a large pitch-shift scale suffers from audio quality degradation, and speaker characteristics deformation. To address this problem, we propose a feed-forward Transformer based TTS model that is designed based on the source-filter theory. This model, called FastPitchFormant, has a unique structure that handles text and acoustic features in parallel. With modeling each feature separately, the tendency that the model learns the relationship between two features can be mitigated.
【18】 Attaining entropy production and dissipation maps from Brownian movies via neural networks 标题:用神经网络获取布朗电影的熵产和耗散图
作者:Youngkyoung Bae,Dong-Kyum Kim,Hawoong Jeong 机构:Department of Physics, Korea Advanced Institute of Science and Technology, Daejeon , Korea, Center for Complex Systems, Korea Advanced Institute of Science and Technology, Daejeon , Korea 备注:15 pages, 8 figures 链接:https://arxiv.org/abs/2106.15108 摘要:量化熵产生(EP)对于理解介观尺度上的随机系统,如生物体或生物组合体是必不可少的。然而,在不跟踪相关变量的情况下,很难从实验记录的时间序列图像数据中找出EP发生的位置和程度。在这里,我们应用卷积神经网络(CNN)这一强大的图像处理工具,通过一种仅从电影中计算的无监督学习算法,提出了一种EP的估计方法。结合CNN最后一层的注意图,我们的方法不仅可以量化随机EP,而且可以产生EP(耗散图)的时空模式。我们证明了我们的方法可以精确地测量EP并在两个非平衡系统中创建耗散图,即珠-弹簧模型和弹性细丝网络。我们进一步证实了高性能,即使有噪音,低空间分辨率的数据和部分观测的情况。我们的方法将提供一种获得耗散映射的实用方法,并最终有助于揭示复杂系统的非平衡性质。 摘要:Quantifying entropy production (EP) is essential to understand stochastic systems at mesoscopic scales, such as living organisms or biological assemblies. However, without tracking the relevant variables, it is challenging to figure out where and to what extent EP occurs from recorded time-series image data from experiments. Here, applying a convolutional neural network (CNN), a powerful tool for image processing, we develop an estimation method for EP through an unsupervised learning algorithm that calculates only from movies. Together with an attention map of the CNN's last layer, our method can not only quantify stochastic EP but also produce the spatiotemporal pattern of the EP (dissipation map). We show that our method accurately measures the EP and creates a dissipation map in two nonequilibrium systems, the bead-spring model and a network of elastic filaments. We further confirm high performance even with noisy, low spatial resolution data, and partially observed situations. Our method will provide a practical way to obtain dissipation maps and ultimately contribute to uncovering the nonequilibrium nature of complex systems.
【19】 Characterization of the Variation Spaces Corresponding to Shallow Neural Networks 标题:浅层神经网络对应的变异空间的刻画
作者:Jonathan W. Siegel,Jinchao Xu 机构:Department of Mathematics, Pennsylvania State University, University Park, PA 备注:arXiv admin note: substantial text overlap with arXiv:2101.12365 链接:https://arxiv.org/abs/2106.15002 摘要:我们考虑了$L^2(\Omega)$中函数字典对应的变分空间,并给出了这些空间中近似的基本理论。具体地说,我们比较了基于积分表示的定义和基于凸壳的定义。我们证明了在许多情况下,包括对应于浅ReLU$^k$网络的字典和衰减Fourier模式的字典,这两个定义是一致的。我们还给出了浅ReLU$^k$网络的变分空间的部分特征,并证明了关于衰减Fourier模字典的变分空间对应于Barron谱空间。 摘要:We consider the variation space corresponding to a dictionary of functions in $L^2(\Omega)$ and present the basic theory of approximation in these spaces. Specifically, we compare the definition based on integral representations with the definition in terms of convex hulls. We show that in many cases, including the dictionaries corresponding to shallow ReLU$^k$ networks and a dictionary of decaying Fourier modes, that the two definitions coincide. We also give a partial characterization of the variation space for shallow ReLU$^k$ networks and show that the variation space with respect to the dictionary of decaying Fourier modes corresponds to the Barron spectral space.
【20】 Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks 标题:浅层神经网络逼近速度的精确下界
作者:Jonathan W. Siegel,Jinchao Xu 机构:Department of Mathematics, Pennsylvania State University, University Park, PA 备注:arXiv admin note: substantial text overlap with arXiv:2101.12365 链接:https://arxiv.org/abs/2106.14997 摘要:我们考虑了关于变分范数的浅层神经网络的逼近率。对于sigmoidal和ReLU激活函数,已经建立了这些速率的上界,但是这些速率是否尖锐仍然是一个重要的开放问题。本文通过证明由神经网络基函数凸壳的$L^2$度量熵的下界得到的浅层神经网络逼近率的锐下界,给出了这个问题的一个解决方案。此外,我们的方法也给出了凸壳Kolmogorov$n$-宽度的尖锐下界,这表明浅层神经网络对应的变分空间不能用线性方法有效地逼近。这些下界既适用于有界变化的sigmoid激活函数,也适用于ReLU的幂次激活函数。我们的结果还量化了Barron谱范数比变分范数强多少,并结合以前的结果,给出了在ReLU激活函数的情况下,$L^\infty$-度量熵到对数因子的渐近性。 摘要:We consider the approximation rates of shallow neural networks with respect to the variation norm. Upper bounds on these rates have been established for sigmoidal and ReLU activation functions, but it has remained an important open problem whether these rates are sharp. In this article, we provide a solution to this problem by proving sharp lower bounds on the approximation rates for shallow neural networks, which are obtained by lower bounding the $L^2$-metric entropy of the convex hull of the neural network basis functions. In addition, our methods also give sharp lower bounds on the Kolmogorov $n$-widths of this convex hull, which show that the variation spaces corresponding to shallow neural networks cannot be efficiently approximated by linear methods. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU. Our results also quantify how much stronger the Barron spectral norm is than the variation norm and, combined with previous results, give the asymptotics of the $L^\infty$-metric entropy up to logarithmic factors in the case of the ReLU activation function.
其他(18篇)
【1】 An Image is Worth More Than a Thousand Words: Towards Disentanglement in the Wild 标题:一幅图像胜过千言万语:走向荒野中的解脱
作者:Aviv Gabbay,Niv Cohen,Yedid Hoshen 机构:School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel 备注:Project page: this http URL 链接:https://arxiv.org/abs/2106.15610 摘要:无监督解纠缠已被证明是理论上不可能没有归纳偏见的模型和数据。作为一种替代方法,最近的方法依赖于有限的监督来分离变异因素并允许其可识别性。虽然只有有限数量的观察才需要注释真正的生成因子,但我们认为,列举所有描述真实世界图像分布的变异因子是不可行的。为此,我们提出了一种方法来分离一组只被部分标记的因子,以及分离一组从未被明确指定的互补剩余因子。我们在这一具有挑战性的环境中取得的成功,在合成基准上得到了证明,这使得我们能够利用现成的图像描述符,以最少的手动工作对真实图像域(例如人脸)中的属性子集进行部分注释。具体来说,我们使用最近的语言图像嵌入模型(CLIP)以Zero-Shot的方式注释一组感兴趣的属性,并展示最先进的分离图像处理结果。 摘要:Unsupervised disentanglement has been shown to be theoretically impossible without inductive biases on the models and the data. As an alternative approach, recent methods rely on limited supervision to disentangle the factors of variation and allow their identifiability. While annotating the true generative factors is only required for a limited number of observations, we argue that it is infeasible to enumerate all the factors of variation that describe a real-world image distribution. To this end, we propose a method for disentangling a set of factors which are only partially labeled, as well as separating the complementary set of residual factors that are never explicitly specified. Our success in this challenging setting, demonstrated on synthetic benchmarks, gives rise to leveraging off-the-shelf image descriptors to partially annotate a subset of attributes in real image domains (e.g. of human faces) with minimal manual effort. Specifically, we use a recent language-image embedding model (CLIP) to annotate a set of attributes of interest in a zero-shot manner and demonstrate state-of-the-art disentangled image manipulation results.
【2】 An Ambient Intelligence-Based Human Behavior Monitoring Framework for Ubiquitous Environments 标题:一种基于环境智能的泛在环境人体行为监测框架
作者:Nirmalya Thakur,Chia Y. Han 机构:����������, Citation: Thakur, N.; Han, C.Y. An, Ambient Intelligence-Based Human, Behavior Monitoring Framework for, Ubiquitous Environments. 备注:None 链接:https://arxiv.org/abs/2106.15609 摘要:这一人类行为监测框架旨在采取一种整体的方法来研究、跟踪、监测和分析日常生活活动中的人类行为。该框架由两个新功能组成。首先,它可以在ADLs过程中对不同的上下文参数执行用户交互的语义分析,以识别与不同复杂活动相关联的不同行为模式的列表。其次,它由智能决策算法组成,可以分析这些行为模式及其与环境的动态上下文和空间特征的关系,以检测用户行为中可能构成紧急情况的任何异常。这个跨学科框架的这些功能是通过整合人机交互、机器学习、物联网、模式识别和普适计算领域的最新进展和技术而开发的。在ADLs数据集上对该框架进行了评估,两种功能的性能准确率分别为76.71%和83.87%。所呈现和讨论的结果支持了该框架的相关性和巨大潜力,有助于在未来基于物联网(IoT)的无处不在的生活环境(如智能家居)中提高老年人口的生活质量和辅助生活。 摘要:This framework for human behavior monitoring aims to take a holistic approach to study, track, monitor, and analyze human behavior during activities of daily living (ADLs). The framework consists of two novel functionalities. First, it can perform the semantic analysis of user interactions on the diverse contextual parameters during ADLs to identify a list of distinct behavioral patterns associated with different complex activities. Second, it consists of an intelligent decision-making algorithm that can analyze these behavioral patterns and their relationships with the dynamic contextual and spatial features of the environment to detect any anomalies in user behavior that could constitute an emergency. These functionalities of this interdisciplinary framework were developed by integrating the latest advancements and technologies in human-computer interaction, machine learning, Internet of Things, pattern recognition, and ubiquitous computing. The framework was evaluated on a dataset of ADLs, and the performance accuracies of these two functionalities were found to be 76.71% and 83.87%, respectively. The presented and discussed results uphold the relevance and immense potential of this framework to contribute towards improving the quality of life and assisted living of the aging population in the future of Internet of Things (IoT)-based ubiquitous living environments, e.g., smart homes.
【3】 Multimodal Approaches for Indoor Localization for Ambient Assisted Living in Smart Homes 标题:智能家居环境辅助生活室内定位的多模态方法
作者:Nirmalya Thakur,Chia Y. Han 机构:����������, Citation: Thakur, N.; Han, C.Y., Multimodal Approaches for Indoor, Localization for Ambient Assisted, Living in Smart Homes. Information 备注:None 链接:https://arxiv.org/abs/2106.15606 摘要:这项工作对智能家居环境辅助生活的室内定位领域做出了多项科学贡献。首先,它提出了一种大数据驱动的方法,研究用户交互的多模态组件,分析来自蓝牙低能量(BLE)信标和BLE扫描仪的数据,以检测用户在日常生活活动中特定活动区域的室内位置。其次,它引入了一种与上下文无关的方法,可以从不同的行为模式中解释加速度计和陀螺仪数据,从而在任何基于物联网(IoT)的环境中检测用户基于区域的室内位置。在一个数据集上进行测试时,这两种方法的性能准确率分别为81.36%和81.13%。第三,根据国际定位与跟踪系统测试标准ISO/IEC18305:2016中的性能评估指标之一——均方根误差,提出了一种检测用户室内位置空间坐标的方法,其性能优于该领域的所有类似工作。最后,提出了一个综合的比较研究,包括随机森林、人工神经网络、决策树、支持向量机、k-NN、梯度增强树、深度学习和线性回归,以解决室内定位中确定最佳机器学习方法的挑战。 摘要:This work makes multiple scientific contributions to the field of Indoor Localization for Ambient Assisted Living in Smart Homes. First, it presents a Big-Data driven methodology that studies the multimodal components of user interactions and analyzes the data from Bluetooth Low Energy (BLE) beacons and BLE scanners to detect a user's indoor location in a specific activity-based zone during Activities of Daily Living. Second, it introduces a context independent approach that can interpret the accelerometer and gyroscope data from diverse behavioral patterns to detect the zone-based indoor location of a user in any Internet of Things (IoT)-based environment. These two approaches achieved performance accuracies of 81.36% and 81.13%, respectively, when tested on a dataset. Third, it presents a methodology to detect the spatial coordinates of a user's indoor position that outperforms all similar works in this field, as per the associated root mean squared error - one of the performance evaluation metrics in ISO/IEC18305:2016- an international standard for testing Localization and Tracking Systems. Finally, it presents a comprehensive comparative study that includes Random Forest, Artificial Neural Network, Decision Tree, Support Vector Machine, k-NN, Gradient Boosted Trees, Deep Learning, and Linear Regression, to address the challenge of identifying the optimal machine learning approach for Indoor Localization.
【4】 Framework for an Intelligent Affect Aware Smart Home Environment for Elderly People 标题:面向老年人的智能情感感知智能家居环境框架
作者:Nirmalya Thakur,Chia Y. Han 机构:edu Department of Electrical Engineering and Computer Science University of Cincinnati Cincinnati 备注:None 链接:https://arxiv.org/abs/2106.15599 摘要:在过去的几十年里,老年人的人口一直在快速增长,预计他们的人口在不久的将来还会进一步增加。随着年龄的增长,老年人面临着身体残疾、认知问题、记忆力减退和行为紊乱等问题,这与他们日益增长的需求有关。为了减轻他们在世界经济中的财政负担,提高他们的生活质量,必须开发具有适应性、辅助性和智能性的基于技术的解决方案。智能情感感知系统不仅可以分析,而且可以预测老年人在物联网环境中与技术的日常交互中的行为,具有巨大的潜力,可以作为改善智能家居中老年人用户体验的长期解决方案。因此,这项工作提出了一个老年人智能情感感知环境的框架,不仅可以分析他们互动的情感成分,而且可以预测他们可能的用户体验,甚至在他们开始在给定的智能家居环境中从事任何活动之前。这种对用户体验的预测将为增强用户体验提供空间,从而增强此类智能系统的辅助性和适应性。为了支持这一框架在改善智能家居中老年人生活质量方面的有效性,我们在三个数据集上进行了测试,并对结果进行了介绍和讨论。 摘要:The population of elderly people has been increasing at a rapid rate over the last few decades and their population is expected to further increase in the upcoming future. Their increasing population is associated with their increasing needs due to problems like physical disabilities, cognitive issues, weakened memory and disorganized behavior, that elderly people face with increasing age. To reduce their financial burden on the world economy and to enhance their quality of life, it is essential to develop technology-based solutions that are adaptive, assistive and intelligent in nature. Intelligent Affect Aware Systems that can not only analyze but also predict the behavior of elderly people in the context of their day to day interactions with technology in an IoT-based environment, holds immense potential for serving as a long-term solution for improving the user experience of elderly in smart homes. This work therefore proposes the framework for an Intelligent Affect Aware environment for elderly people that can not only analyze the affective components of their interactions but also predict their likely user experience even before they start engaging in any activity in the given smart home environment. This forecasting of user experience would provide scope for enhancing the same, thereby increasing the assistive and adaptive nature of such intelligent systems. To uphold the efficacy of this proposed framework for improving the quality of life of elderly people in smart homes, it has been tested on three datasets and the results are presented and discussed.
【5】 Continuous Latent Process Flows 标题:连续潜流
作者:Ruizhi Deng,Marcus A. Brubaker,Greg Mori,Andreas M. Lehrmann 链接:https://arxiv.org/abs/2106.15580 摘要:连续时间序列动力学在任意时间戳下的部分观测存在于许多学科中。使用具有连续动态性的统计模型来拟合这类数据不仅在直观的层面上很有希望,而且具有实际的好处,包括能够生成连续的轨迹并对以前看不到的时间戳进行推断。尽管在这方面取得了令人兴奋的进展,但现有的模型在其表示能力和变分近似的质量方面仍然面临挑战。我们用连续潜在过程流(CLPF)来解决这些挑战,CLPF是一种将连续潜在过程解码为连续可观测过程的原理性结构,它使用随机微分方程驱动的依赖于时间的归一化流。为了利用极大似然法优化模型,我们提出了一种新的分段构造变分后验过程的方法,并利用轨迹重加权得到了相应的变分下界。我们的消融研究证明了我们的贡献在不规则时间网格上的各种推理任务中的有效性。与最新基线的比较表明,我们的模型在合成和真实时间序列数据上都具有良好的性能。 摘要:Partial observations of continuous time-series dynamics at arbitrary time stamps exist in many disciplines. Fitting this type of data using statistical models with continuous dynamics is not only promising at an intuitive level but also has practical benefits, including the ability to generate continuous trajectories and to perform inference on previously unseen time stamps. Despite exciting progress in this area, the existing models still face challenges in terms of their representational power and the quality of their variational approximations. We tackle these challenges with continuous latent process flows (CLPF), a principled architecture decoding continuous latent processes into continuous observable processes using a time-dependent normalizing flow driven by a stochastic differential equation. To optimize our model using maximum likelihood, we propose a novel piecewise construction of a variational posterior process and derive the corresponding variational lower bound using trajectory re-weighting. Our ablation studies demonstrate the effectiveness of our contributions in various inference tasks on irregular time grids. Comparisons to state-of-the-art baselines show our model's favourable performance on both synthetic and real-world time-series data.
【6】 A Mechanism for Producing Aligned Latent Spaces with Autoencoders 标题:一种利用自动编码器产生对齐潜在空间的机制
作者:Saachi Jain,Adityanarayanan Radhakrishnan,Caroline Uhler 机构: and Institute for Data, Massachusetts Institute ofTechnology 1Code available at https 链接:https://arxiv.org/abs/2106.15456 摘要:对齐的潜在空间,其中输入空间中有意义的语义转移对应于嵌入空间中的翻译,对下游任务的成功起着重要作用,如无监督聚类和数据插补。在这项工作中,我们证明了线性和非线性自动编码器通过沿着数据的左奇异向量拉伸产生对齐的潜在空间。我们充分描述了线性自动编码器中的拉伸量,并提供了一个初始化方案,使用这些网络沿顶部方向任意拉伸。我们还量化了拉伸量在非线性自动编码器在一个简化的设置。我们利用我们的理论结果在基因表达空间的细胞类型和单词嵌入空间的语义转移中对齐药物特征。 摘要:Aligned latent spaces, where meaningful semantic shifts in the input space correspond to a translation in the embedding space, play an important role in the success of downstream tasks such as unsupervised clustering and data imputation. In this work, we prove that linear and nonlinear autoencoders produce aligned latent spaces by stretching along the left singular vectors of the data. We fully characterize the amount of stretching in linear autoencoders and provide an initialization scheme to arbitrarily stretch along the top directions using these networks. We also quantify the amount of stretching in nonlinear autoencoders in a simplified setting. We use our theoretical results to align drug signatures across cell types in gene expression space and semantic shifts in word embedding spaces.
【7】 MAML is a Noisy Contrastive Learner 标题:MAML是一个嘈杂的对比学习者
作者:Chia-Hsiang Kao,Wei-Chen Chiu,Pin-Yu Chen 机构:†National Yang Ming Chiao Tung University, Taiwan, ‡IBM Research 备注:15 pages, 11 figures 链接:https://arxiv.org/abs/2106.15367 摘要:模型不可知元学习(MAML)是当前最流行、应用最广泛的元学习算法之一,在各种学习问题中取得了显著的成功。然而,由于嵌套内循环和外循环更新的独特设计分别控制着任务特定学习和元模型中心学习,MAML的潜在学习目标仍然是隐含的,因此阻碍了对其更直接的理解。本文为MAML的工作机制提供了一个新的视角,发现:MAML类似于一个使用监督对比目标函数的元学习者,其中的查询特征被拉向同一类的支持特征,而不是不同类的支持特征,通过基于余弦相似性的分析实验验证了这种对比性。此外,我们的分析显示,香草MAML算法有一个不良的干扰项源自随机初始化和跨任务交互。因此,我们提出了一种简单而有效的技术,即归零技术来减轻这种干扰,并在minimagenet和Omniglot数据集上进行了大量的实验,证明了我们提出的技术所带来的一致性改进,从而很好地验证了它的有效性。 摘要:Model-agnostic meta-learning (MAML) is one of the most popular and widely-adopted meta-learning algorithms nowadays, which achieves remarkable success in various learning problems. Yet, with the unique design of nested inner-loop and outer-loop updates which respectively govern the task-specific and meta-model-centric learning, the underlying learning objective of MAML still remains implicit and thus impedes a more straightforward understanding of it. In this paper, we provide a new perspective to the working mechanism of MAML and discover that: MAML is analogous to a meta-learner using a supervised contrastive objective function, where the query features are pulled towards the support features of the same class and against those of different classes, in which such contrastiveness is experimentally verified via an analysis based on the cosine similarity. Moreover, our analysis reveals that the vanilla MAML algorithm has an undesirable interference term originating from the random initialization and the cross-task interaction. We therefore propose a simple but effective technique, zeroing trick, to alleviate such interference, where the extensive experiments are then conducted on both miniImagenet and Omniglot datasets to demonstrate the consistent improvement brought by our proposed technique thus well validating its effectiveness.
【8】 Scalable Gaussian Processes for Data-Driven Design using Big Data with Categorical Factors 标题:基于分类因素的大数据数据驱动设计的可扩展高斯过程
作者:Liwei Wang,Akshay Iyer,Suraj Yerramilli,Daniel Apley,Ping Zhu,Wei Chen 机构:a. The State Key Laboratory of Mechanical System and Vibration, Shanghai Key Laboratory of Digital Manufacture for Thin-Walled Structures, School of Mechanical Engineering, Shanghai Jiao Tong University, Shanghai, P.R. China, b. Dept. of Mechanical Engineering 备注:Preprint submitted to Journal of Mechanical Design 链接:https://arxiv.org/abs/2106.15356 摘要:科学和工程问题通常需要使用人工智能来帮助理解和寻找有前途的设计。虽然高斯过程(Gaussian processs,GP)是一种易于使用和解释的学习者,但它们在适应大数据集、分类输入和多个响应方面存在困难,这已成为越来越多的数据驱动设计应用的共同挑战。在本文中,我们提出了一个GP模型,它利用通过变分推理得到的潜在变量和函数来同时解决上述问题。该方法建立在潜变量高斯过程(LVGP)模型的基础上,将分类因子映射到一个连续的潜空间中,实现了混合变量数据集的GP建模。通过将变分推理扩展到LVGP模型中,用一个小的诱导点集代替大的训练数据集来解决可伸缩性问题。输出响应向量由独立的潜在函数的线性组合表示,形成灵活的内核结构来处理可能具有不同行为的多个响应。比较研究表明,该方法对10^4个数据点以上的大数据集具有很好的可扩展性,同时在不需要太多超参数调整的情况下优于现有的机器学习方法。此外,还获得了一个可解释的潜在空间,以便深入了解分类因素的影响,例如与建筑积木和超材料和材料设计中的元素选择相关的因素。我们的方法被证明是机器学习的三元氧化物材料和拓扑优化的多尺度顺应机制与非周期性的微观结构和多种材料。 摘要:Scientific and engineering problems often require the use of artificial intelligence to aid understanding and the search for promising designs. While Gaussian processes (GP) stand out as easy-to-use and interpretable learners, they have difficulties in accommodating big datasets, categorical inputs, and multiple responses, which has become a common challenge for a growing number of data-driven design applications. In this paper, we propose a GP model that utilizes latent variables and functions obtained through variational inference to address the aforementioned challenges simultaneously. The method is built upon the latent variable Gaussian process (LVGP) model where categorical factors are mapped into a continuous latent space to enable GP modeling of mixed-variable datasets. By extending variational inference to LVGP models, the large training dataset is replaced by a small set of inducing points to address the scalability issue. Output response vectors are represented by a linear combination of independent latent functions, forming a flexible kernel structure to handle multiple responses that might have distinct behaviors. Comparative studies demonstrate that the proposed method scales well for large datasets with over 10^4 data points, while outperforming state-of-the-art machine learning methods without requiring much hyperparameter tuning. In addition, an interpretable latent space is obtained to draw insights into the effect of categorical factors, such as those associated with building blocks of architectures and element choices in metamaterial and materials design. Our approach is demonstrated for machine learning of ternary oxide materials and topology optimization of a multiscale compliant mechanism with aperiodic microstructures and multiple materials.
【9】 Probabilistic Attention for Interactive Segmentation 标题:交互式分词中的概率关注度
作者:Prasad Gabbur,Manjot Bilkhu,Javier Movellan 机构:Apple 备注:17 pages, 8 figures 链接:https://arxiv.org/abs/2106.15338 摘要:我们提供了注意的概率解释,并证明了Transformer中的标准点积注意是最大后验概率(MAP)推理的特例。提出的方法建议使用期望最大化算法在线调整关键和价值模型参数。这种方法对于外部代理(如注释器)提供有关某些标记的正确值(如某些像素的语义类别)的推理时信息的情况非常有用,我们需要以原则性的方式将此新信息传播到其他标记。在一个交互式语义切分任务中,注释者和模型在线协作以提高注释效率。使用标准基准,我们观察到关键自适应在低反馈状态下提高了模型性能($\sim10\%$mIoU),而值传播在高反馈状态下提高了模型响应性。我们的概率注意模型的PyTorch层实现将公开。 摘要:We provide a probabilistic interpretation of attention and show that the standard dot-product attention in transformers is a special case of Maximum A Posteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g, the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ($\sim10\%$ mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime. A PyTorch layer implementation of our probabilistic attention model will be made publicly available.
【10】 Privacy Budget Scheduling 标题:隐私预算计划
作者:Tao Luo,Mingen Pan,Pierre Tholoniat,Asaf Cidon,Roxana Geambasu,Mathias Lécuyer 机构:Columbia University, Microsoft Research 备注:Extended version of a paper presented at the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI '21) 链接:https://arxiv.org/abs/2106.15335 摘要:根据个人数据训练的机器学习(ML)模型已经被证明会泄露用户的信息。差分隐私(DP)使得模型训练在这个泄漏上有一个保证的界限。每一个用DP训练的新模型都增加了数据泄漏的限制,并且可以被视为消耗全球隐私预算的一部分,不应该超过这个预算。这个预算是一个稀缺的资源,必须谨慎管理,以最大限度地提高成功训练的模型的数量。我们描述了PrivateKube,它是流行的Kubernetes数据中心编排器的一个扩展,它将隐私作为一种新型资源与其他传统计算资源(如CPU、GPU和内存)一起管理。我们为隐私资源设计的抽象与Kubernetes为传统资源定义的抽象是一致的,但也有很大的区别。例如,传统的计算资源是可补充的,而隐私是不可补充的:在模型完成执行后,CPU可以重新获得,而隐私预算则不能。这种区别迫使调度器重新设计。我们提出了DPF(显性私有块公平)算法,它是流行的显性资源公平(DRF)算法的一个变种,它面向不可补充的隐私资源,但具有与DRF相似的理论特性。我们在微基准上评估PrivateKube和DPF,在Amazon评论数据上评估ML工作负载。与现有的基线相比,DPF允许在相同的全局隐私保证下训练更多的模型。这对于R′enyi DP上的DPF尤其如此,DP是一种高度可组合的形式。 摘要:Machine learning (ML) models trained on personal data have been shown to leak information about users. Differential privacy (DP) enables model training with a guaranteed bound on this leakage. Each new model trained with DP increases the bound on data leakage and can be seen as consuming part of a global privacy budget that should not be exceeded. This budget is a scarce resource that must be carefully managed to maximize the number of successfully trained models. We describe PrivateKube, an extension to the popular Kubernetes datacenter orchestrator that adds privacy as a new type of resource to be managed alongside other traditional compute resources, such as CPU, GPU, and memory. The abstractions we design for the privacy resource mirror those defined by Kubernetes for traditional resources, but there are also major differences. For example, traditional compute resources are replenishable while privacy is not: a CPU can be regained after a model finishes execution while privacy budget cannot. This distinction forces a re-design of the scheduler. We present DPF (Dominant Private Block Fairness) -- a variant of the popular Dominant Resource Fairness (DRF) algorithm -- that is geared toward the non-replenishable privacy resource but enjoys similar theoretical properties as DRF. We evaluate PrivateKube and DPF on microbenchmarks and an ML workload on Amazon Reviews data. Compared to existing baselines, DPF allows training more models under the same global privacy guarantee. This is especially true for DPF over R\'enyi DP, a highly composable form of DP.
【11】 Artificial Intelligence in Minimally Invasive Interventional Treatment 标题:人工智能在微创介入治疗中的应用
作者:Daniel Ruijters 机构: Philips Healthcare, Image Guided Therapy Systems Innovation, Veenpluis ,PC, the Netherlands, Technische Universiteit Eindhoven, Dept. Electrical Engineering, Den Dolech ,AZ Eindhoven 链接:https://arxiv.org/abs/2106.15306 摘要:微创图像引导治疗程序通常采用先进的图像处理算法。人工智能算法的最新发展为进一步加强这一领域提供了潜力。本文探讨了微创治疗领域的几个应用领域,并讨论了人工智能在这些领域的应用。 摘要:Minimally invasive image guided treatment procedures often employ advanced image processing algorithms. The recent developments of artificial intelligence algorithms harbor potential to further enhance this domain. In this article we explore several application areas within the minimally invasive treatment space and discuss the deployment of artificial intelligence within these areas.
【12】 MuViS: Online MU-MIMO Grouping for Multi-User Applications Over Commodity WiFi 标题:MuViS:商品WiFi上多用户应用的在线MU-MIMO分组
作者:Hannaneh Barahouei Pasandi,Tamer Nadeem,Hadi Amirpour 机构:Nadeem, Virginia Commonwealth University, University of Klagenfurt 备注:14 Pages 链接:https://arxiv.org/abs/2106.15262 摘要:在过去的十年中,带宽扩展和MU-MIMO频谱效率已经承诺通过允许一个接入点和多个用户之间的并发通信来提高数据吞吐量。然而,由于不同的信道条件和设备、不可靠的传输以及上层和下层需求之间缺乏有用的反馈交换,我们距离在实际WiFi网络设置中享受这种MU-MIMO MAC协议的改进还有很长的路要走。本文介绍了一种新的双相位优化框架MuViS,它提出了一种基于体验质量(QoE)的多用户ieee802.11ac视频流MU-MIMO优化方法。MuViS首先根据用户的PHY/MAC层特性,采用强化学习来优化MU-MIMO用户组和模式选择。然后基于用户的模式(多用户(MU)或单用户(SU))优化视频比特率。我们介绍了我们的设计及其对使用802.11ac WiFi的智能手机和笔记本电脑的评估。我们在不同室内环境和配置下的实验结果表明,一个可扩展的框架能够支持大量用户以高视频速率进行流媒体传输,并满足QoE要求。 摘要:Over the last decade, the bandwidth expansion and MU-MIMO spectral efficiency have promised to increase data throughput by allowing concurrent communication between one Access Point and multiple users. However, we are still a long way from enjoying such MU-MIMO MAC protocol improvements for bandwidth hungry applications such as video streaming in practical WiFi network settings due to heterogeneous channel conditions and devices, unreliable transmissions, and lack of useful feedback exchange among the lower and upper layers' requirements. This paper introduces MuViS, a novel dual-phase optimization framework that proposes a Quality of Experience (QoE) aware MU-MIMO optimization for multi-user video streaming over IEEE 802.11ac. MuViS first employs reinforcement learning to optimize the MU-MIMO user group and mode selection for users based on their PHY/MAC layer characteristics. The video bitrate is then optimized based on the user's mode (Multi-User (MU) or Single-User (SU)). We present our design and its evaluation on smartphones and laptops using 802.11ac WiFi. Our experimental results in various indoor environments and configurations show a scalable framework that can support a large number of users with streaming at high video rates and satisfying QoE requirements.
【13】 Joint Majorization-Minimization for Nonnegative Matrix Factorization with the β-divergence标题:具有β-散度的非负矩阵分解的联合优化-最小化
作者:Arthur Marmin,José Henrique de Morais Goulart,Cédric Févotte 链接:https://arxiv.org/abs/2106.15214 摘要:本文提出了一种新的带$\beta$-散度目标函数的非负矩阵分解乘法更新算法。我们的新更新来自一个联合优化最小化(MM)方案,其中一个辅助函数(目标函数的紧上界)是为两个因素共同建立的,并在每次迭代中最小化。这与传统的方法不同,传统的方法是交替优化因子,并对每个因子分别应用MM方案。与经典方法一样,我们的联合MM算法也会产生易于实现的乘法更新。但是,它们会显著减少计算时间(对于同样好的解),特别是对于一些具有重要应用价值的$\beta$发散,例如平方欧氏距离和Kullback-Leibler或Itakura-Saito发散。我们报告实验结果使用不同的数据集:人脸图像,音频频谱图,高光谱数据和歌曲播放计数。根据$\beta$的值和数据集,与经典的交替方案相比,我们的联合MM方法可以减少大约10\%$到78\%$的CPU时间。 摘要:This article proposes new multiplicative updates for nonnegative matrix factorization (NMF) with the $\beta$-divergence objective function. Our new updates are derived from a joint majorization-minimization (MM) scheme, in which an auxiliary function (a tight upper bound of the objective function) is built for the two factors jointly and minimized at each iteration. This is in contrast with the classic approach in which the factors are optimized alternately and a MM scheme is applied to each factor individually. Like the classic approach, our joint MM algorithm also results in multiplicative updates that are simple to implement. They however yield a significant drop of computation time (for equally good solutions), in particular for some $\beta$-divergences of important applicative interest, such as the squared Euclidean distance and the Kullback-Leibler or Itakura-Saito divergences. We report experimental results using diverse datasets: face images, audio spectrograms, hyperspectral data and song play counts. Depending on the value of $\beta$ and on the dataset, our joint MM approach yields a CPU time reduction of about $10\%$ to $78\%$ in comparison to the classic alternating scheme.
【14】 TUCaN: Progressively Teaching Colourisation to Capsules 标题:图灿:逐步教授胶囊着色
作者:Rita Pucci,Niki Martinel 机构:Machine Learning and Perception Lab, University of Udine, Udine, ORCID:,-,-,-, ORCID: ,-,-,- 链接:https://arxiv.org/abs/2106.15176 摘要:自动图像着色是计算机视觉的研究方向,研究如何对灰度图像进行着色(恢复)。深度学习技术改善了图像的着色效果,产生了惊人的结果。这些不同的因素,如结构差异,输入类型,用户辅助等,其中大部分是基于卷积层的架构结构,而不是侧重于对象特征提取的层。我们介绍了一种新的下采样-上采样结构TUCaN(Tiny UCapsNet),它利用卷积层和胶囊层的协作来获得每个图像中实体的整洁着色。这是通过在这些层之间通过跳过和剩余连接强制协作来实现的。我们提出的问题作为一个每像素的颜色分类任务,将颜色识别为一个箱子在量化空间。为了训练网络,与标准的端到端学习方法相比,本文提出了一种渐进式学习方法,在不改变学习模型的情况下,只通过操作学习过程来提取对象的上下文。在该方案中,上采样从低分辨率图像的重建开始,在整个训练阶段逐渐增长到高分辨率图像。在三个基准数据集上的实验结果表明,我们使用ImageNet10k数据集的方法在标准质量指标上优于现有的方法,并且在图像着色方面达到了最先进的性能。我们进行了一项用户研究,以量化着色结果的感知真实性,结果表明:渐进式学习可以让用户获得比端到端方案更好的颜色;并指出了现有评价指标的局限性。 摘要:Automatic image colourisation is the computer vision research path that studies how to colourise greyscale images (for restoration). Deep learning techniques improved image colourisation yielding astonishing results. These differ by various factors, such as structural differences, input types, user assistance, etc. Most of them, base the architectural structure on convolutional layers with no emphasis on layers specialised in object features extraction. We introduce a novel downsampling upsampling architecture named TUCaN (Tiny UCapsNet) that exploits the collaboration of convolutional layers and capsule layers to obtain a neat colourisation of entities present in every single image. This is obtained by enforcing collaboration among such layers by skip and residual connections. We pose the problem as a per pixel colour classification task that identifies colours as a bin in a quantized space. To train the network, in contrast with the standard end to end learning method, we propose the progressive learning scheme to extract the context of objects by only manipulating the learning process without changing the model. In this scheme, the upsampling starts from the reconstruction of low resolution images and progressively grows to high resolution images throughout the training phase. Experimental results on three benchmark datasets show that our approach with ImageNet10k dataset outperforms existing methods on standard quality metrics and achieves state of the art performances on image colourisation. We performed a user study to quantify the perceptual realism of the colourisation results demonstrating: that progressive learning let the TUCaN achieve better colours than the end to end scheme; and pointing out the limitations of the existing evaluation metrics.
【15】 How to Reach Real-Time AI on Consumer Devices? Solutions for Programmable and Custom Architectures 标题:如何在消费类设备上实现实时人工智能?针对可编程和自定义架构的解决方案
作者:Stylianos I. Venieris,Ioannis Panopoulos,Ilias Leontiadis,Iakovos S. Venieris 机构:†Samsung AI Center, Cambridge, UK, ‡National Technical University of Athens, Athens, Greece 备注:Invited paper at the 32nd IEEE International Conference on Application-Specific Systems, Architectures and Processors (ASAP), 2021 链接:https://arxiv.org/abs/2106.15021 摘要:深度神经网络(DNNs)的空前性能使得各种人工智能(AI)推理任务,如目标识别和语音识别,有了长足的发展。然而,在商用设备上部署这样的人工智能模型面临着巨大的挑战:巨大的计算成本、多个性能目标、硬件异构性和对高精度的共同需求,这些共同构成了dnn在各种嵌入式和移动设备上部署的关键问题。因此,我们还没有看到最先进的深度学习算法在消费类设备上的主流应用。在本文中,我们提供了一系列有效的人工智能系统设计技术,初步回答了这个可能改变游戏规则的问题。我们首先检查主要障碍时,针对可编程处理器和定制加速器。然后,我们提出了不同的方法来实现实时性能后,跨栈的方法。这些技术跨越了模型级、系统级和硬件级技术及其组合。我们的发现提供了人工智能系统的示例,这些系统不会使移动硬件负担过重,同时也说明了它们如何提高推理的准确性。此外,我们还展示了基于ASIC和FPGA的定制加速器如何成为下一代人工智能应用(如多DNN系统)的有利因素。总的来说,这些结果强调了进一步探索各种跨栈解决方案如何最佳组合的关键需求,以便以稳健和高效的方式将深度学习的最新进展带给用户。 摘要:The unprecedented performance of deep neural networks (DNNs) has led to large strides in various Artificial Intelligence (AI) inference tasks, such as object and speech recognition. Nevertheless, deploying such AI models across commodity devices faces significant challenges: large computational cost, multiple performance objectives, hardware heterogeneity and a common need for high accuracy, together pose critical problems to the deployment of DNNs across the various embedded and mobile devices in the wild. As such, we have yet to witness the mainstream usage of state-of-the-art deep learning algorithms across consumer devices. In this paper, we provide preliminary answers to this potentially game-changing question by presenting an array of design techniques for efficient AI systems. We start by examining the major roadblocks when targeting both programmable processors and custom accelerators. Then, we present diverse methods for achieving real-time performance following a cross-stack approach. These span model-, system- and hardware-level techniques, and their combination. Our findings provide illustrative examples of AI systems that do not overburden mobile hardware, while also indicating how they can improve inference accuracy. Moreover, we showcase how custom ASIC- and FPGA-based accelerators can be an enabling factor for next-generation AI applications, such as multi-DNN systems. Collectively, these results highlight the critical need for further exploration as to how the various cross-stack solutions can be best combined in order to bring the latest advances in deep learning close to users, in a robust and efficient manner.
【16】 A Survey on Neural Speech Synthesis 标题:神经语音合成技术综述
作者:Xu Tan,Tao Qin,Frank Soong,Tie-Yan Liu 机构:Microsoft Research Asia 备注:A comprehensive survey on TTS, 63 pages, 18 tables, 7 figures, 447 references 链接:https://arxiv.org/abs/2106.15561 摘要:文本到语音(Text-to-speech,简称TTS)是语音、语言和机器学习领域的一个研究热点,在工业领域有着广泛的应用。近年来,随着深度学习和人工智能的发展,基于神经网络的TTS技术显著提高了合成语音的质量。本文对神经TTS进行了全面的综述,旨在对神经TTS的研究现状和发展趋势有一个很好的认识。我们重点讨论了神经TTS的关键组成部分,包括文本分析、声学模型和声码器,以及一些高级主题,包括快速TTS、低资源TTS、鲁棒TTS、表达TTS和自适应TTS等。我们进一步总结了与TTS相关的资源(如数据集,并讨论未来的研究方向。这项调查可以服务于学术研究人员和行业从业人员的TTS工作。 摘要:Text to speech (TTS), or speech synthesis, which aims to synthesize intelligible and natural speech given text, is a hot research topic in speech, language, and machine learning communities and has broad applications in the industry. As the development of deep learning and artificial intelligence, neural network-based TTS has significantly improved the quality of synthesized speech in recent years. In this paper, we conduct a comprehensive survey on neural TTS, aiming to provide a good understanding of current research and future trends. We focus on the key components in neural TTS, including text analysis, acoustic models and vocoders, and several advanced topics, including fast TTS, low-resource TTS, robust TTS, expressive TTS, and adaptive TTS, etc. We further summarize resources related to TTS (e.g., datasets, opensource implementations) and discuss future research directions. This survey can serve both academic researchers and industry practitioners working on TTS.
【17】 Fast Approximation of the Sliced-Wasserstein Distance Using Concentration of Random Projections 标题:基于随机投影集中的切片-瓦瑟斯坦距离的快速逼近
作者:Kimia Nadjahi,Alain Durmus,Pierre E. Jacob,Roland Badeau,Umut Şimşekli 机构:LTCI, Télécom Paris, Institut Polytechnique de Paris, France, Centre Borelli, ENS Paris-Saclay, CNRS, Université Paris-Saclay, France, Department of Statistics, Harvard University, USA 链接:https://arxiv.org/abs/2106.15427 摘要:切片Wasserstein距离(SW)作为Wasserstein距离的一种替代方法,在机器学习应用中得到了越来越多的应用,并提供了显著的计算和统计优势。由于SW被定义为对随机投影的期望,因此SW通常用montecarlo近似。我们采用一种新的视角,利用测度集中现象来近似SW:在温和的假设下,高维随机向量的一维投影近似为高斯分布。基于这个观察,我们发展了一个简单的SW确定性近似。我们的方法不需要对许多随机投影进行抽样,因此与通常的蒙特卡罗近似法相比,它既准确又易于使用。在弱依赖于数据分布的条件下,证明了近似误差随维数的增加而趋于零。我们在合成数据集上验证了我们的理论发现,并在一个生成性建模问题上说明了所提出的近似方法。 摘要:The Sliced-Wasserstein distance (SW) is being increasingly used in machine learning applications as an alternative to the Wasserstein distance and offers significant computational and statistical benefits. Since it is defined as an expectation over random projections, SW is commonly approximated by Monte Carlo. We adopt a new perspective to approximate SW by making use of the concentration of measure phenomenon: under mild assumptions, one-dimensional projections of a high-dimensional random vector are approximately Gaussian. Based on this observation, we develop a simple deterministic approximation for SW. Our method does not require sampling a number of random projections, and is therefore both accurate and easy to use compared to the usual Monte Carlo approximation. We derive nonasymptotical guarantees for our approach, and show that the approximation error goes to zero as the dimension increases, under a weak dependence condition on the data distribution. We validate our theoretical findings on synthetic datasets, and illustrate the proposed approximation on a generative modeling problem.
【18】 Federated Dynamic Spectrum Access 标题:联合动态频谱接入
作者:Yifei Song,Hao-Hsuan Chang,Zhou Zhou,Shashank Jere,Lingjia Liu 机构:The authors are with the Department of Electrical and ComputerEngineering, Virginia Polytechnic Institute and State University 链接:https://arxiv.org/abs/2106.14976 摘要:随着物联网设备数量的激增,产生了越来越多的数据流量,对无线频谱资源的需求正接近联邦通信委员会(FCC)定义的极限。为此,动态频谱接入(DSA)被认为是一种很有前途的解决频谱资源短缺的技术。然而,标准的DSA技术往往依赖于无线网络的分析建模,使得其在测量不足的网络环境中难以应用。因此,利用神经网络来逼近网络动力学是另一种方法。本文介绍了一种基于联邦学习(FL)的DSA任务框架,其中FL是一种分布式机器学习框架,能够在异构数据分布下保留网络终端的隐私。我们讨论这个框架的机遇、挑战和开放性问题。为了评估它的可行性,我们实现了一个基于多智能体强化学习(MARL)的FL,并将其与初始评估结果相关联。 摘要:Due to the growing volume of data traffic produced by the surge of Internet of Things (IoT) devices, the demand for radio spectrum resources is approaching their limitation defined by Federal Communications Commission (FCC). To this end, Dynamic Spectrum Access (DSA) is considered as a promising technology to handle this spectrum scarcity. However, standard DSA techniques often rely on analytical modeling wireless networks, making its application intractable in under-measured network environments. Therefore, utilizing neural networks to approximate the network dynamics is an alternative approach. In this article, we introduce a Federated Learning (FL) based framework for the task of DSA, where FL is a distributive machine learning framework that can reserve the privacy of network terminals under heterogeneous data distributions. We discuss the opportunities, challenges, and opening problems of this framework. To evaluate its feasibility, we implement a Multi-Agent Reinforcement Learning (MARL)-based FL as a realization associated with its initial evaluation results.