前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习学术速递[8.30]

机器学习学术速递[8.30]

作者头像
公众号-arXiv每日学术速递
发布2021-09-16 14:32:22
1.3K0
发布2021-09-16 14:32:22
举报

Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!

cs.LG 方向,今日共计60篇

Graph相关(图学习|图神经网络|图优化等)(4篇)

【1】 Group-Aware Graph Neural Network for Nationwide City Air Quality Forecasting 标题:群体感知图神经网络在全国城市空气质量预报中的应用 链接:https://arxiv.org/abs/2108.12238

作者:Ling Chen,Jiahui Xu,Binqing Wu,Yuntao Qian,Zhenhong Du,Yansheng Li,Yongjun Zhang 机构:Zhejiang University 摘要:空气污染问题威胁着公众健康。空气质量预测可以在几小时甚至几天后提供空气质量指数,帮助公众提前预防空气污染。以往的工作侧重于城市范围内的空气质量预测,无法解决全国范围内的城市预测问题,其困难在于捕捉地理距离遥远但高度相关的城市之间的潜在相关性。在本文中,我们提出了群体感知图神经网络(GAGNN),这是一种全国城市空气质量预测的分层模型。该模型构建了城市图和城市群图,分别对城市之间的空间依赖关系和潜在依赖关系进行建模。GAGNN引入可微分组网络来发现城市之间潜在的依赖关系并生成城市群。在生成城市群的基础上,引入群相关编码模块,学习城市群之间的相关性,有效地获取城市群之间的相关性。在图的构造之后,GAGNN实现了消息传递机制来建模城市和城市组之间的依赖关系。在中国城市空气质量数据集上的评价实验表明,我们的GAGNN优于现有的预测模型。 摘要:The problem of air pollution threatens public health. Air quality forecasting can provide the air quality index hours or even days later, which can help the public to prevent air pollution in advance. Previous works focus on citywide air quality forecasting and cannot solve nationwide city forecasting problem, whose difficulties lie in capturing the latent dependencies between geographically distant but highly correlated cities. In this paper, we propose the group-aware graph neural network (GAGNN), a hierarchical model for nationwide city air quality forecasting. The model constructs a city graph and a city group graph to model the spatial and latent dependencies between cities, respectively. GAGNN introduces differentiable grouping network to discover the latent dependencies among cities and generate city groups. Based on the generated city groups, a group correlation encoding module is introduced to learn the correlations between them, which can effectively capture the dependencies between city groups. After the graph construction, GAGNN implements message passing mechanism to model the dependencies between cities and city groups. The evaluation experiments on Chinese city air quality dataset indicate that our GAGNN outperforms existing forecasting models.

【2】 Enel: Context-Aware Dynamic Scaling of Distributed Dataflow Jobs using Graph Propagation 标题:Enel:基于图传播的分布式数据流作业上下文感知动态扩展 链接:https://arxiv.org/abs/2108.12211

作者:Dominik Scheinert,Houkun Zhu,Lauritz Thamsen,Morgan K. Geldenhuys,Jonathan Will,Alexander Acker,Odej Kao 机构: and Odej KaoTechnische Universit¨at Berlin 备注:8 pages, 5 figures, 3 tables 摘要:Spark和Flink等分布式数据流系统支持使用集群进行可扩展的数据分析。虽然运行时预测模型可用于在给定目标运行时最初选择适当的集群资源,但数据流作业的实际运行时性能取决于多个因素,并随时间而变化。然而,在许多情况下,尽管存在显著的性能差异,动态扩展仍然可以用来满足制定的运行时目标。本文介绍了一种新的动态缩放方法Enel,该方法使用属性图上的消息传播对数据流作业进行建模,从而可以导出有效的重缩放决策。为此,Enel合并了描述性属性,这些属性捕获各自的执行上下文,考虑来自单个数据流任务的统计信息,并通过作业图传播预测,以最终找到优化的新扩展。我们使用四个迭代Spark作业对Enel进行的评估表明,我们的方法能够识别有效的重新缩放操作,例如对节点故障做出反应,并且可以在不同的执行上下文中重用。 摘要:Distributed dataflow systems like Spark and Flink enable the use of clusters for scalable data analytics. While runtime prediction models can be used to initially select appropriate cluster resources given target runtimes, the actual runtime performance of dataflow jobs depends on several factors and varies over time. Yet, in many situations, dynamic scaling can be used to meet formulated runtime targets despite significant performance variance. This paper presents Enel, a novel dynamic scaling approach that uses message propagation on an attributed graph to model dataflow jobs and, thus, allows for deriving effective rescaling decisions. For this, Enel incorporates descriptive properties that capture the respective execution context, considers statistics from individual dataflow tasks, and propagates predictions through the job graph to eventually find an optimized new scale-out. Our evaluation of Enel with four iterative Spark jobs shows that our approach is able to identify effective rescaling actions, reacting for instance to node failures, and can be reused across different execution contexts.

【3】 Graph-based Incident Aggregation for Large-Scale Online Service Systems 标题:大规模在线服务系统中基于图的事件聚合 链接:https://arxiv.org/abs/2108.12179

作者:Zhuangbin Chen,Jinyang Liu,Yuxin Su,Hongyu Zhang,Xuemin Wen,Xiao Ling,Yongqiang Yang,Michael R. Lyu 机构:The Chinese University of Hong Kong, hk‡The University of Newcastle 备注:Accepted by 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE'21) 摘要:随着在线服务系统在复杂性和数量方面的不断增长,如何管理服务事件将显著影响公司收入和用户信任。由于级联效应,云故障通常伴随着大量来自相关服务和设备的事件。为了追求有效的事件管理,相关事件应迅速汇总,以缩小问题范围。为此,在本文中,我们提出了GRLIA,一个基于云故障级联图上的图表示学习的事件聚合框架。以无监督和统一的方式学习每个独特类型事件的表示向量,该表示向量能够同时编码事件之间的拓扑和时间相关性。因此,它可以很容易地用于在线事件聚合。特别是,为了更准确地了解相关性,我们尝试通过利用细粒度系统监控数据,即关键性能指标(KPI),来恢复故障级联影响的完整范围。利用从华为云大型在线服务系统收集的真实事件数据对所提出的框架进行了评估。实验结果表明,GRLIA方法是有效的,性能优于现有方法。此外,我们的框架已成功应用于工业实践。 摘要:As online service systems continue to grow in terms of complexity and volume, how service incidents are managed will significantly impact company revenue and user trust. Due to the cascading effect, cloud failures often come with an overwhelming number of incidents from dependent services and devices. To pursue efficient incident management, related incidents should be quickly aggregated to narrow down the problem scope. To this end, in this paper, we propose GRLIA, an incident aggregation framework based on graph representation learning over the cascading graph of cloud failures. A representation vector is learned for each unique type of incident in an unsupervised and unified manner, which is able to simultaneously encode the topological and temporal correlations among incidents. Thus, it can be easily employed for online incident aggregation. In particular, to learn the correlations more accurately, we try to recover the complete scope of failures' cascading impact by leveraging fine-grained system monitoring data, i.e., Key Performance Indicators (KPIs). The proposed framework is evaluated with real-world incident data collected from a large-scale online service system of Huawei Cloud. The experimental results demonstrate that GRLIA is effective and outperforms existing methods. Furthermore, our framework has been successfully deployed in industrial practice.

【4】 Towards Self-Explainable Graph Neural Network 标题:走向自解释图神经网络 链接:https://arxiv.org/abs/2108.12055

作者:Enyan Dai,Suhang Wang 机构:The Pennsylvania State University 摘要:图神经网络(graphneuralnetworks,GNNs)将深度神经网络推广到图形结构数据,在图形建模方面取得了巨大的成功。然而,作为图形深度学习的扩展,GNN缺乏可解释性,这在很大程度上限制了它们在需要模型透明度的场景中的应用。虽然人们在提高深度学习的可解释性方面做了很多努力,但主要集中在i.i.d数据上,由于GNN同时利用节点特征和图形拓扑进行预测,因此无法直接应用i.i.d数据来解释GNN的预测。关于GNN的可解释性方面的工作很少,他们主要关注事后解释。由于事后解释不是直接从GNN获得的,它们可能会有偏见并歪曲真实的解释。因此,在本文中,我们研究了一个可以同时给出预测和解释的自解释GNN的新问题。我们提出了一个新的框架,可以为每个未标记节点找到$K$-最近的标记节点,从而给出可解释的节点分类,其中最近的标记节点由可解释的相似性模块根据节点相似性和局部结构相似性找到。在真实数据集和合成数据集上的大量实验证明了所提出的可解释节点分类框架的有效性。 摘要:Graph Neural Networks (GNNs), which generalize the deep neural networks to graph-structured data, have achieved great success in modeling graphs. However, as an extension of deep learning for graphs, GNNs lack explainability, which largely limits their adoption in scenarios that demand the transparency of models. Though many efforts are taken to improve the explainability of deep learning, they mainly focus on i.i.d data, which cannot be directly applied to explain the predictions of GNNs because GNNs utilize both node features and graph topology to make predictions. There are only very few work on the explainability of GNNs and they focus on post-hoc explanations. Since post-hoc explanations are not directly obtained from the GNNs, they can be biased and misrepresent the true explanations. Therefore, in this paper, we study a novel problem of self-explainable GNNs which can simultaneously give predictions and explanations. We propose a new framework which can find $K$-nearest labeled nodes for each unlabeled node to give explainable node classification, where nearest labeled nodes are found by interpretable similarity module in terms of both node similarity and local structure similarity. Extensive experiments on real-world and synthetic datasets demonstrate the effectiveness of the proposed framework for explainable node classification.

Transformer(2篇)

【1】 The Devil is in the Detail: Simple Tricks Improve Systematic Generalization of Transformers 标题:魔鬼在细节中:简单的技巧改善了Transformer的系统化 链接:https://arxiv.org/abs/2108.12284

作者:Róbert Csordás,Kazuki Irie,Jürgen Schmidhuber 机构:The Swiss AI Lab IDSIA, USI & SUPSI, Lugano, Switzerland 备注:Accepted to EMNLP 2021 摘要:最近,人们提出了许多数据集来测试神经网络的系统泛化能力。伴随基线转换器通常使用标准任务中的默认超参数进行训练,结果显示会出现严重故障。在这里,我们证明了通过重新审视模型配置(如嵌入的缩放、早期停止、相对位置嵌入和通用变换器变体),我们可以在系统泛化方面显著提高变换器的性能。我们报告了五种流行数据集的改进:扫描、CFQ、PCFG、COGS和数学数据集。我们的模型将PCFG生产率分割的准确率从50%提高到85%,COGS的准确率从35%提高到81%。扫描时,相对位置嵌入在很大程度上缓解了EOS决策问题(Newman et al.,2020),在长度分割上产生100%的准确性,截止值为26。重要的是,这些模型之间的性能差异通常在IID数据分割上看不到。这就需要适当的泛化验证集来开发系统泛化的神经网络。我们公开发布代码以复制我们的结果。 摘要:Recently, many datasets have been proposed to test the systematic generalization ability of neural networks. The companion baseline Transformers, typically trained with default hyper-parameters from standard tasks, are shown to fail dramatically. Here we demonstrate that by revisiting model configurations as basic as scaling of embeddings, early stopping, relative positional embedding, and Universal Transformer variants, we can drastically improve the performance of Transformers on systematic generalization. We report improvements on five popular datasets: SCAN, CFQ, PCFG, COGS, and Mathematics dataset. Our models improve accuracy from 50% to 85% on the PCFG productivity split, and from 35% to 81% on COGS. On SCAN, relative positional embedding largely mitigates the EOS decision problem (Newman et al., 2020), yielding 100% accuracy on the length split with a cutoff at 26. Importantly, performance differences between these models are typically invisible on the IID data split. This calls for proper generalization validation sets for developing neural networks that generalize systematically. We publicly release the code to reproduce our results.

【2】 Can the Transformer Be Used as a Drop-in Replacement for RNNs in Text-Generating GANs? 标题:在文本生成GAN中,是否可以使用Transformer替代RNN? 链接:https://arxiv.org/abs/2108.12275

作者:Kevin Blin,Andrei Kucharavy 机构:Section of Communication Systems, Bachelor semester , EPFL, Lausanne, Switzerland, Distributed Computing Laboratory 备注:accepted to RANLP 2021 摘要:在本文中,我们解决的问题微调文本生成有限的计算预算。为此,我们使用了性能良好的文本生成对抗性网络(GAN)体系结构-多样性促进GAN(DPGAN),并尝试使用基于自我注意的转换层替换LSTM层,以利用其效率。由此产生的自我注意DPGAN(SADPGAN)被评估为生成文本的性能、质量和多样性以及稳定性。计算实验表明,Transformer结构无法替换LSTM层,在预训练阶段性能不足,在GAN调谐阶段经历完全模式崩溃。我们的结果表明,在将transformer架构用作文本生成GAN中RNN的替代品之前,需要对其进行调整。 摘要:In this paper we address the problem of fine-tuned text generation with a limited computational budget. For that, we use a well-performing text generative adversarial network (GAN) architecture - Diversity-Promoting GAN (DPGAN), and attempted a drop-in replacement of the LSTM layer with a self-attention-based Transformer layer in order to leverage their efficiency. The resulting Self-Attention DPGAN (SADPGAN) was evaluated for performance, quality and diversity of generated text and stability. Computational experiments suggested that a transformer architecture is unable to drop-in replace the LSTM layer, under-performing during the pre-training phase and undergoing a complete mode collapse during the GAN tuning phase. Our results suggest that the transformer architecture need to be adapted before it can be used as a replacement for RNNs in text-generating GANs.

GAN|对抗|攻击|生成相关(1篇)

【1】 Using GAN-based models to sentimental analysis on imbalanced datasets in education domain 标题:基于GAN的模型在教育领域不平衡数据集情感分析中的应用 链接:https://arxiv.org/abs/2108.12061

作者:Ru Yang,Maryam Edalati 机构:Department of Computer Science, Norwegian University of Science and Technology, Gjøvik, Norway 摘要:虽然全世界仍在与COVID-19流行病斗争,在线学习和家庭办公室变得越来越普遍。许多学校将课程教学转移到网上教室。因此,从学生对学习的评价中挖掘学生的反馈和意见是非常重要的,这样学校和教师都可以知道他们需要改进的地方。本文使用平衡和非平衡数据集训练机器学习和深度学习模型进行情感分类。两个SOTA类别感知文本生成GAN模型:CatGAN和SentiGAN,用于合成用于平衡高度不平衡数据集的文本。在三个不同领域的不平衡程度不同的数据集上的结果表明,当使用生成的文本来平衡数据集时,机器学习和深度学习模型在情感分类上的F1分数提高了2.79%~9.28%。此外,实验结果还表明,CR100k的平均增长度高于CR23k,深度学习的平均增长度比机器学习算法更高,更复杂的深度学习模型的平均增长度比简单的深度学习模型更高。 摘要:While the whole world is still struggling with the COVID-19 pandemic, online learning and home office become more common. Many schools transfer their courses teaching to the online classroom. Therefore, it is significant to mine the students' feedback and opinions from their reviews towards studies so that both schools and teachers can know where they need to improve. This paper trains machine learning and deep learning models using both balanced and imbalanced datasets for sentiment classification. Two SOTA category-aware text generation GAN models: CatGAN and SentiGAN, are utilized to synthesize text used to balance the highly imbalanced dataset. Results on three datasets with different imbalance degree from distinct domains show that when using generated text to balance the dataset, the F1-score of machine learning and deep learning model on sentiment classification increases 2.79% ~ 9.28%. Also, the results indicate that the average growth degree for CR100k is higher than CR23k, the average growth degree for deep learning is more increased than machine learning algorithms, and the average growth degree for more complex deep learning models is more increased than simpler deep learning models in experiments.

半/弱/无/有监督|不确定性|主动学习(2篇)

【1】 Contrastive Mixup: Self- and Semi-Supervised learning for Tabular Domain 标题:对比混合:表格域的自监督和半监督学习 链接:https://arxiv.org/abs/2108.12296

作者:Sajad Darabi,Shayan Fazeli,Ali Pazoki,Sriram Sankararaman,Majid Sarrafzadeh 机构:UCLA 摘要:最近关于自我监督的文献表明,在缩小图像和文本领域中监督和非监督方法之间的差距方面取得了重大进展。这些方法依赖于特定领域的扩充,而这些扩充并不直接适用于表格领域。相反,我们引入了对比混搭,一种针对表格数据的半监督学习框架,并在有限的注释数据环境中证明了其有效性。我们提出的方法利用流形假设下基于混合的增广,将样本映射到低维潜在空间,并鼓励插值样本在同一标记类内具有较高的相似性。未标记的样本通过传导性标签传播方法额外使用,以进一步丰富可用于对比损失项的相似和不相似对集。我们在公共表格数据集和现实世界的临床数据集上证明了该框架的有效性。 摘要:Recent literature in self-supervised has demonstrated significant progress in closing the gap between supervised and unsupervised methods in the image and text domains. These methods rely on domain-specific augmentations that are not directly amenable to the tabular domain. Instead, we introduce Contrastive Mixup, a semi-supervised learning framework for tabular data and demonstrate its effectiveness in limited annotated data settings. Our proposed method leverages Mixup-based augmentation under the manifold assumption by mapping samples to a low dimensional latent space and encourage interpolated samples to have high a similarity within the same labeled class. Unlabeled samples are additionally employed via a transductive label propagation method to further enrich the set of similar and dissimilar pairs that can be used in the contrastive loss term. We demonstrate the effectiveness of the proposed framework on public tabular datasets and real-world clinical datasets.

【2】 A Framework for Supervised Heterogeneous Transfer Learning using Dynamic Distribution Adaptation and Manifold Regularization 标题:基于动态分布自适应和流形正则化的有监督异构迁移学习框架 链接:https://arxiv.org/abs/2108.12293

作者:Md Geaur Rahman,Md Zahidul Islam 机构:School of Computing, Mathematics and Engineering, Charles Sturt University, Australia 备注:34 pages, 10 figures 摘要:迁移学习的目的是通过从源领域转移知识来学习目标领域的分类器。然而,由于两个主要问题:特征差异和分布差异,迁移学习在实践中可能是一个非常困难的问题。在本文中,我们提出了一个称为TLF的框架,该框架通过从具有多个标记记录的源域转移知识,为只有少量标记训练记录的目标域构建分类器。虽然现有的方法通常只关注一个问题,而将另一个问题留给下一步的工作,但TLF能够同时处理这两个问题。在TLF中,我们通过识别作为连接域的枢轴的共享标签分布来缓解特征差异。我们通过同时优化结构风险函数、域之间的联合分布和边际分布下的流形一致性来处理分布分歧。此外,对于流形一致性,我们通过识别记录的k个最近邻来利用其固有属性,其中k的值在TLF中自动确定。此外,由于不需要负迁移,我们只考虑在知识转移过程中属于源枢轴的源记录。我们在七个公开的自然数据集上评估TLF,并将TLF的性能与十一种最先进技术的性能进行比较。我们还评估了TLF在一些具有挑战性的情况下的有效性。我们的实验结果,包括统计符号检验和Nemenyi检验分析,表明所提出的框架明显优于最先进的技术。 摘要:Transfer learning aims to learn classifiers for a target domain by transferring knowledge from a source domain. However, due to two main issues: feature discrepancy and distribution divergence, transfer learning can be a very difficult problem in practice. In this paper, we present a framework called TLF that builds a classifier for the target domain having only few labeled training records by transferring knowledge from the source domain having many labeled records. While existing methods often focus on one issue and leave the other one for the further work, TLF is capable of handling both issues simultaneously. In TLF, we alleviate feature discrepancy by identifying shared label distributions that act as the pivots to bridge the domains. We handle distribution divergence by simultaneously optimizing the structural risk functional, joint distributions between domains, and the manifold consistency underlying marginal distributions. Moreover, for the manifold consistency we exploit its intrinsic properties by identifying k nearest neighbors of a record, where the value of k is determined automatically in TLF. Furthermore, since negative transfer is not desired, we consider only the source records that are belonging to the source pivots during the knowledge transfer. We evaluate TLF on seven publicly available natural datasets and compare the performance of TLF against the performance of eleven state-of-the-art techniques. We also evaluate the effectiveness of TLF in some challenging situations. Our experimental results, including statistical sign test and Nemenyi test analyses, indicate a clear superiority of the proposed framework over the state-of-the-art techniques.

迁移|Zero/Few/One-Shot|自适应(4篇)

【1】 An Adaptive Clustering Approach for Accident Prediction 标题:一种用于事故预测的自适应聚类方法 链接:https://arxiv.org/abs/2108.12308

作者:Rajjat Dadwal,Thorben Funke,Elena Demidova 机构:©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media 备注:7 pages, 4 figures 摘要:交通事故预测是机动性领域的一项重要任务。最先进的事故预测方法基于静态和统一的基于网格的地理空间聚合,限制了它们进行细粒度预测的能力。在城市中心等更复杂的地区,这一特性尤其成问题。在这些区域中,网格单元可以包含具有不同属性的子区域;此外,一个实际的事故易发区域可以在网格单元之间任意分割。提出了一种基于网格生长算法的自适应聚类事故预测(ACAP)方法。ACAP对观测到的地理空间事故分布应用自适应聚类,并嵌入时间、事故相关和区域特征,以提高预测精度。我们使用来自德国三个城市的开放现实世界事故数据集证明了所提出的ACAP方法的有效性。我们证明,ACAP通过调整地理空间聚合以适应潜在时空事件的分布,将复杂区域的事故预测性能提高了F1分数的2-3个百分点。就F1分数而言,我们的网格增长方法比基于聚类的基线平均高出4个百分点。 摘要:Traffic accident prediction is a crucial task in the mobility domain. State-of-the-art accident prediction approaches are based on static and uniform grid-based geospatial aggregations, limiting their capability for fine-grained predictions. This property becomes particularly problematic in more complex regions such as city centers. In such regions, a grid cell can contain subregions with different properties; furthermore, an actual accident-prone region can be split across grid cells arbitrarily. This paper proposes Adaptive Clustering Accident Prediction (ACAP) - a novel accident prediction method based on a grid growing algorithm. ACAP applies adaptive clustering to the observed geospatial accident distribution and performs embeddings of temporal, accident-related, and regional features to increase prediction accuracy. We demonstrate the effectiveness of the proposed ACAP method using open real-world accident datasets from three cities in Germany. We demonstrate that ACAP improves the accident prediction performance for complex regions by 2-3 percent points in F1-score by adapting the geospatial aggregation to the distribution of the underlying spatio-temporal events. Our grid growing approach outperforms the clustering-based baselines by four percent points in terms of F1-score on average.

【2】 Continual learning under domain transfer with sparse synaptic bursting 标题:稀疏突触爆发域转移下的连续学习 链接:https://arxiv.org/abs/2108.12056

作者:Shawn L. Beaulieu,Jeff Clune,Nick Cheney 机构:Systems Center; fDepartment of Computer Science, University of British Columbia: Vancouver, BC, Canada 摘要:现有的机器是功能特定的工具,易于预测和控制。未来的机器在易变性、弹性和自主性方面可能更接近生物系统。但首先,他们必须能够学习和保留新信息,而不必反复接触。过去设计这类系统的努力都是在应用环境受限的情况下,利用特定于任务的模块来构建或调节人工神经网络。这还不能在不破坏现有知识的情况下对以前看不见的长序列数据进行持续学习:这是一个被称为灾难性遗忘的问题。在本文中,我们介绍了一个系统,该系统可以在以前看不到的数据集(ImageNet,CIFAR-100)上顺序学习,并且随着时间的推移几乎不会忘记。这是通过使用由第二个前馈神经网络生成的自上而下调制,在输入的基础上调节卷积神经网络中权重的活动来实现的。我们发现,我们的方法在域转移下不断学习,在任务之间循环的权重中有稀疏的活动突发,而不是通过维护特定于任务的模块。研究发现,稀疏的突触爆破可以平衡活动的增强和减弱,从而有助于适应新的输入,而不会破坏先前获得的功能。这种行为出现在先前的元学习阶段,在此阶段,受调节的突触从一致抑制的初始状态选择性地去抑制或生长。 摘要:Existing machines are functionally specific tools that were made for easy prediction and control. Tomorrow's machines may be closer to biological systems in their mutability, resilience, and autonomy. But first they must be capable of learning, and retaining, new information without repeated exposure to it. Past efforts to engineer such systems have sought to build or regulate artificial neural networks using task-specific modules with constrained circumstances of application. This has not yet enabled continual learning over long sequences of previously unseen data without corrupting existing knowledge: a problem known as catastrophic forgetting. In this paper, we introduce a system that can learn sequentially over previously unseen datasets (ImageNet, CIFAR-100) with little forgetting over time. This is accomplished by regulating the activity of weights in a convolutional neural network on the basis of inputs using top-down modulation generated by a second feed-forward neural network. We find that our method learns continually under domain transfer with sparse bursts of activity in weights that are recycled across tasks, rather than by maintaining task-specific modules. Sparse synaptic bursting is found to balance enhanced and diminished activity in a way that facilitates adaptation to new inputs without corrupting previously acquired functions. This behavior emerges during a prior meta-learning phase in which regulated synapses are selectively disinhibited, or grown, from an initial state of uniform suppression.

【3】 Finite-time System Identification and Adaptive Control in Autoregressive Exogenous Systems 标题:自回归外生系统的有限时间系统辨识与自适应控制 链接:https://arxiv.org/abs/2108.11959

作者:Sahin Lale,Kamyar Azizzadenesheli,Babak Hassibi,Anima Anandkumar 机构:California Institute of Technology, Purdue University, Editors: A. Jadbabaie, J. Lygeros, G. J. Pappas, P. A. Parrilo, B. Recht, C. J. Tomlin, M. N. Zeilinger 备注:3rd Annual Learning for Dynamics & Control Conference (L4DC) 摘要:自回归外生(ARX)系统是用于建模随机线性动力系统(LDS)的一般输入输出动力系统,包括部分可观测LDS,如LQG系统。在这项工作中,我们研究了未知ARX系统的系统辨识和自适应控制问题。我们为ARX系统在开环和闭环数据采集下提供有限时间学习保证。利用这些保证,我们设计了具有任意强凸或凸二次调节代价的未知ARX系统的自适应控制算法。在强凸代价函数下,我们设计了一种基于在线梯度下降的自适应控制算法来设计和更新通过凸控制器重新参数化构造的控制器。我们通过探索和提交方法证明了我们的算法有$\tilde{\mathcal{O}(\sqrt{T})$遗憾,并且如果使用闭环数据收集在各个时期更新模型估计,则在$T$交互时间步后,它会达到$\text{polylog}(T)$的最佳遗憾。对于凸二次型成本函数的情况,我们提出了一种自适应控制算法,该算法利用不确定性原理设计控制器。在这个设置中,我们展示了explore-and-commit方法有一个遗憾的上界$\tilde{\mathcal{O}(T^{2/3})$,并且具有连续模型估计更新的自适应控制在$T$时间步后达到$\tilde{\mathcal{O}(\sqrt{T})遗憾。 摘要:Autoregressive exogenous (ARX) systems are the general class of input-output dynamical systems used for modeling stochastic linear dynamical systems (LDS) including partially observable LDS such as LQG systems. In this work, we study the problem of system identification and adaptive control of unknown ARX systems. We provide finite-time learning guarantees for the ARX systems under both open-loop and closed-loop data collection. Using these guarantees, we design adaptive control algorithms for unknown ARX systems with arbitrary strongly convex or convex quadratic regulating costs. Under strongly convex cost functions, we design an adaptive control algorithm based on online gradient descent to design and update the controllers that are constructed via a convex controller reparametrization. We show that our algorithm has $\tilde{\mathcal{O}}(\sqrt{T})$ regret via explore and commit approach and if the model estimates are updated in epochs using closed-loop data collection, it attains the optimal regret of $\text{polylog}(T)$ after $T$ time-steps of interaction. For the case of convex quadratic cost functions, we propose an adaptive control algorithm that deploys the optimism in the face of uncertainty principle to design the controller. In this setting, we show that the explore and commit approach has a regret upper bound of $\tilde{\mathcal{O}}(T^{2/3})$, and the adaptive control with continuous model estimate updates attains $\tilde{\mathcal{O}}(\sqrt{T})$ regret after $T$ time-steps.

【4】 Targeting Underrepresented Populations in Precision Medicine: A Federated Transfer Learning Approach 标题:以精确医学中代表性不足的人群为目标:一种联合迁移学习方法 链接:https://arxiv.org/abs/2108.12112

作者:Sai Li,Tianxi Cai,Rui Duan 机构:Institute of Statistics and Big Data, Renmin University of China, Department of Biostatistics, Harvard University 摘要:少数群体和弱势群体在大规模临床和基因组学研究中的代表性有限,已成为将精确医学研究转化为实践的障碍。由于不同人群的异质性,风险预测模型在这些代表性不足的人群中往往表现不佳,因此可能进一步加剧已知的健康差异。在本文中,我们提出了一种双向数据集成策略,该策略通过联邦转移学习方法集成来自不同人群和多个医疗机构的异构数据。所提出的方法可以处理来自不同群体的样本量高度不平衡的挑战性环境。通过参与站点之间的少量通信,所提出的方法可以实现与汇集分析相当的性能,其中单个级别的数据直接汇集在一起。我们表明,该方法提高了代表性不足群体的估计和预测精度,并缩小了群体间模型性能的差距。我们的理论分析揭示了通信预算、隐私限制和人口异质性如何影响估计精度。通过数值实验和多中心研究的实际应用,我们证明了我们方法的可行性和有效性。在多中心研究中,我们构建了AA人群II型糖尿病的多基因风险预测模型。 摘要:The limited representation of minorities and disadvantaged populations in large-scale clinical and genomics research has become a barrier to translating precision medicine research into practice. Due to heterogeneity across populations, risk prediction models are often found to be underperformed in these underrepresented populations, and therefore may further exacerbate known health disparities. In this paper, we propose a two-way data integration strategy that integrates heterogeneous data from diverse populations and from multiple healthcare institutions via a federated transfer learning approach. The proposed method can handle the challenging setting where sample sizes from different populations are highly unbalanced. With only a small number of communications across participating sites, the proposed method can achieve performance comparable to the pooled analysis where individual-level data are directly pooled together. We show that the proposed method improves the estimation and prediction accuracy in underrepresented populations, and reduces the gap of model performance across populations. Our theoretical analysis reveals how estimation accuracy is influenced by communication budgets, privacy restrictions, and heterogeneity across populations. We demonstrate the feasibility and validity of our methods through numerical experiments and a real application to a multi-center study, in which we construct polygenic risk prediction models for Type II diabetes in AA population.

强化学习(3篇)

【1】 Reinforcement Learning based Condition-oriented Maintenance Scheduling for Flow Line Systems 标题:基于强化学习的流水线系统状态导向维修调度 链接:https://arxiv.org/abs/2108.12298

作者:Raphael Lamprecht,Ferdinand Wurst,Marco F. Huber 机构:Center for Cyber Cognitive Intelligence (CCI), Fraunhofer IPA, Stuttgart, Germany, Institute of Industrial Manufacturing and Management IFF, University of Stuttgart 备注:Accepted at the IEEE International Conference on Industrial Informatics (INDIN) 2021 摘要:维修计划是生产领域中的一个复杂决策问题,其中必须向生产实体分配和安排大量维修任务和资源,以防止计划外的生产停机。需要能够适应生产系统动态和不同条件的智能维护策略。针对流水线系统中面向状态的维修调度问题,提出了一种深度强化学习方法。根据基于奖励模型的基准调度启发式学习、分析和评估不同的策略。对所学习策略的评估表明,基于强化学习的维护策略满足所提出用例的要求,并且适合车间的维护调度。 摘要:Maintenance scheduling is a complex decision-making problem in the production domain, where a number of maintenance tasks and resources has to be assigned and scheduled to production entities in order to prevent unplanned production downtime. Intelligent maintenance strategies are required that are able to adapt to the dynamics and different conditions of production systems. The paper introduces a deep reinforcement learning approach for condition-oriented maintenance scheduling in flow line systems. Different policies are learned, analyzed and evaluated against a benchmark scheduling heuristic based on reward modelling. The evaluation of the learned policies shows that reinforcement learning based maintenance strategies meet the requirements of the presented use case and are suitable for maintenance scheduling in the shop floor.

【2】 Deep Reinforcement Learning for Wireless Resource Allocation Using Buffer State Information 标题:基于缓冲区状态信息的深度强化学习在无线资源分配中的应用 链接:https://arxiv.org/abs/2108.12198

作者:Eike-Manuel Bansbach,Victor Eliachevitch,Laurent Schmalen 机构:Communications Engineering Lab, Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany 备注:accepted for publication at GLOBECOM 2021 摘要:随着无线网络中具有不同数据速率和延迟要求的用户设备(ue)数量的增加,正交频分多址(OFDMA)的资源分配问题变得具有挑战性。特别地,当最大化系统数据速率同时保持ue之间的公平性时,不同的需求导致非凸优化问题。在本文中,我们使用深度强化学习(DRL)来解决非凸优化问题。我们概述、训练和评估一个DRL代理,该代理执行下行链路OFDMA场景的媒体访问控制调度任务。为了启动我们代理的训练,我们引入了模仿学习。为了提高调度性能,考虑了基站处的完整缓冲区状态信息(例如,分组年龄、分组大小)。输入特征压缩、数据包洗牌和年龄上限等技术进一步提高了代理的性能。我们使用诺基亚无线套件训练和评估我们的代理,并针对不同的基准代理进行评估。我们表明,我们的代理明显优于基准代理。 摘要:As the number of user equipments (UEs) with various data rate and latency requirements increases in wireless networks, the resource allocation problem for orthogonal frequency-division multiple access (OFDMA) becomes challenging. In particular, varying requirements lead to a non-convex optimization problem when maximizing the systems data rate while preserving fairness between UEs. In this paper, we solve the non-convex optimization problem using deep reinforcement learning (DRL). We outline, train and evaluate a DRL agent, which performs the task of media access control scheduling for a downlink OFDMA scenario. To kickstart training of our agent, we introduce mimicking learning. For improvement of scheduling performance, full buffer state information at the base station (e.g. packet age, packet size) is taken into account. Techniques like input feature compression, packet shuffling and age capping further improve the performance of the agent. We train and evaluate our agents using Nokia's wireless suite and evaluate against different benchmark agents. We show that our agents clearly outperform the benchmark agents.

【3】 Reinforcement Learning-powered Semantic Communication via Semantic Similarity 标题:基于语义相似度的强化学习语义交流 链接:https://arxiv.org/abs/2108.12121

作者:Kun Lu,Rongpeng Li,Xianfu Chen,Zhifeng Zhao,Honggang Zhang 备注:13 pages, 7 figures. Codes available on Github 摘要:我们引入了一种新的语义通信机制,其核心思想是保留语义信息,而不是严格保证位级精度。从分析现有联合信源信道编码(JSCC)方法的缺陷入手,我们发现常用的比特级度量容易捕捉到重要的语义和结构。为了解决这个问题,我们利用语义相似性来学习,而不是依赖于传统的成对比特级监控,如交叉熵和误码率。然而,考虑到大多数语义度量的不可微性以及来自噪声信道的不稳定性,开发这样一个语义通信系统确实是一项不平凡的任务。为了进一步解决这些问题,我们提出了一种基于强化学习(RL)的解决方案,它允许我们使用策略梯度技术同时优化任何用户定义的语义度量,并以自然的方式与周围的噪声环境交互。我们在具有挑战性的欧洲议会数据集中验证了所提出的方法。在AWGN和相位不变衰落信道上的实验都证实了我们的方法在揭示语义和更好地处理信道噪声方面的优越性,特别是在低信噪比情况下。除了实验结果之外,我们还进一步深入研究了语义模型的行为,以及它在实际例子中的出色泛化能力。作为基于学习的JSCC任务中的一种全新方法,我们还举例说明了基于RL的图像传输范式,以证明其泛化能力,并将这一新主题留给未来讨论。 摘要:We introduce a new semantic communication mechanism, whose key idea is to preserve the semantic information instead of strictly securing the bit-level precision. Starting by analyzing the defects of existing joint source channel coding (JSCC) methods, we show that the commonly used bit-level metrics are vulnerable of catching important semantic meaning and structures. To address this problem, we take advantage of learning from semantic similarity, instead of relying on conventional paired bit-level supervisions like cross entropy and bit error rate. However, to develop such a semantic communication system is indeed a nontrivial task, considering the nondifferentiability of most semantic metrics as well as the instability from noisy channels. To further resolve these issues, we put forward a reinforcement learning (RL)-based solution which allows us to simultaneously optimize any user-defined semantic measurement by using the policy gradient technique, and to interact with the surrounding noisy environment in a natural way. We have testified the proposed method in the challenging European-parliament dataset. Experiments on both AWGN and phase-invariant fading channel have confirmed the superiority of our method in revealing the semantic meanings, and better handling the channel noise especially in low-SNR situations. Apart from the experimental results, we further provide an indepth look at how the semantics model behaves, along with its superb generalization ability in real-life examples. As a brand new method in learning-based JSCC tasks, we also exemplify an RL-based image transmission paradigm, both to prove the generalization ability, and to leave this new topic for future discussion.

符号|符号学习(1篇)

【1】 DomiKnowS: A Library for Integration of Symbolic Domain Knowledge in Deep Learning 标题:DomiKnowS:深度学习中符号领域知识集成的库 链接:https://arxiv.org/abs/2108.12370

作者:Hossein Rajaby Faghihi,Quan Guo,Andrzej Uszok,Aliakbar Nafar,Elaheh Raisi,Parisa Kordjamshidi 机构: Michigan State University, Sichuan University, Florida Institute for Human and Machine Cognition 备注:Accepted at EMNLP 2021 demo track 摘要:我们演示了一个用于在深度学习体系结构中集成领域知识的库。使用该库,数据的结构通过图形声明以符号方式表示,输出或潜在变量的逻辑约束可以无缝地添加到深层模型中。领域知识可以明确定义,这不仅提高了模型在低数据区的性能和可推广性,还提高了模型的可解释性。介绍了符号和亚符号模型集成的几种方法;然而,在可以使用各种底层算法的情况下,没有库以通用方式促进此类集成的编程。我们的库旨在简化在训练和推理阶段进行集成的编程,同时将知识表示与学习算法分离。我们展示了各种NLP基准任务和其他任务。该框架在Github上公开提供(https://github.com/HLR/DomiKnowS). 摘要:We demonstrate a library for the integration of domain knowledge in deep learning architectures. Using this library, the structure of the data is expressed symbolically via graph declarations and the logical constraints over outputs or latent variables can be seamlessly added to the deep models. The domain knowledge can be defined explicitly, which improves the models' explainability in addition to the performance and generalizability in the low-data regime. Several approaches for such an integration of symbolic and sub-symbolic models have been introduced; however, there is no library to facilitate the programming for such an integration in a generic way while various underlying algorithms can be used. Our library aims to simplify programming for such an integration in both training and inference phases while separating the knowledge representation from learning algorithms. We showcase various NLP benchmark tasks and beyond. The framework is publicly available at Github(https://github.com/HLR/DomiKnowS).

医学相关(2篇)

【1】 Detecting Propaganda on the Sentence Level during the COVID-19 Pandemic 标题:在冠状病毒大流行期间检测句子层面的宣传 链接:https://arxiv.org/abs/2108.12269

作者:Rong-Ching Chang,Chu-Hsing Lin 机构:Department of Computer Science, Tunghai University 摘要:外国媒体在社交媒体上传播的错误信息、阴谋和可疑的内容和信息的操纵与COVID-19大流行一起激增。此类恶意网络行为可能导致社会两极分化加剧、健康危机和财产损失。在本文中,使用微调的上下文嵌入嵌入训练ReDIT,我们解决了检测这种宣传用户帐户和他们的目标问题在推特上2020年3月,当COVID-19流行病被认为是大流行。我们的结果显示,亲中派的推特频率似乎是中立派的35到115倍。同时,中性群体鸣叫更积极的态度内容和警报警报COVID-19的情况。亲中组织还在不一定与中国有关的政治问题上使用了更多的呼吁行动的词语。 摘要:The spread of misinformation, conspiracy, and questionable content and information manipulation by foreign adversaries on social media has surged along with the COVID-19 pandemic. Such malicious cyber-enabled actions may cause increasing social polarization, health crises, and property loss. In this paper, using fine-tuned contextualized embedding trained on Reddit, we tackle the detection of the propaganda of such user accounts and their targeted issues on Twitter during March 2020 when the COVID-19 epidemic became recognized as a pandemic. Our result shows that the pro-China group appeared to be tweeting 35 to 115 times more than the neutral group. At the same time, neutral groups were tweeting more positive-attitude content and voicing alarm for the COVID-19 situation. The pro-China group was also using more call-for-action words on political issues not necessarily China-related.

【2】 Anomaly Detection in Medical Imaging -- A Mini Review 标题:医学影像中的异常检测--简评 链接:https://arxiv.org/abs/2108.11986

作者:Maximilian E. Tschuchnig,Michael Gadermayr 机构:Information Technology and Systems Management, Salzburg University of Applied Sciences, Urstein S¨ud , Puch, Austria 备注:Conference: iDSC2021 摘要:医学影像的日益数字化使得基于机器学习的检测、可视化和分割病变的改进成为可能,从而减轻了医学专家的工作量。然而,有监督的机器学习需要可靠的标记数据,这通常很难或不可能收集,或者至少很耗时,因此成本很高。因此,只需要部分标记数据(半监督)或完全不需要标记(非监督方法)的方法得到了更经常的应用。异常检测是一种可能的方法,它能够利用半监督和非监督方法来处理医学成像任务,如分类和分割。本文对医学影像学中的相关异常检测论文进行了半详尽的文献综述,以将其应用于分类,突出重要结果,总结经验教训,并就如何在医学影像学中进行异常检测提出进一步建议。定性分析基于google scholar和4个不同的搜索词,得出120篇不同的分析论文。主要结果表明,当前的研究主要是为了减少对标记数据的需要。此外,脑MRI领域的大量成功研究显示了在OCT和胸部X射线等进一步领域的应用潜力。 摘要:The increasing digitization of medical imaging enables machine learning based improvements in detecting, visualizing and segmenting lesions, easing the workload for medical experts. However, supervised machine learning requires reliable labelled data, which is is often difficult or impossible to collect or at least time consuming and thereby costly. Therefore methods requiring only partly labeled data (semi-supervised) or no labeling at all (unsupervised methods) have been applied more regularly. Anomaly detection is one possible methodology that is able to leverage semi-supervised and unsupervised methods to handle medical imaging tasks like classification and segmentation. This paper uses a semi-exhaustive literature review of relevant anomaly detection papers in medical imaging to cluster into applications, highlight important results, establish lessons learned and give further advice on how to approach anomaly detection in medical imaging. The qualitative analysis is based on google scholar and 4 different search terms, resulting in 120 different analysed papers. The main results showed that the current research is mostly motivated by reducing the need for labelled data. Also, the successful and substantial amount of research in the brain MRI domain shows the potential for applications in further domains like OCT and chest X-ray.

自动驾驶|车辆|车道检测等(5篇)

【1】 Deep Information Fusion for Electric Vehicle Charging Station Occupancy Forecasting 标题:深度信息融合在电动汽车充电站占有率预测中的应用 链接:https://arxiv.org/abs/2108.12352

作者:Ashutosh Sao,Nicolas Tempelmeier,Elena Demidova 机构:©, IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media 摘要:随着电动汽车数量的不断增加,准确预测充电站的使用情况对于实现可靠的车辆充电至关重要。本文介绍了一种新的动态和静态信息深度融合模型(DFDS),用于有效预测充电站的占用率。我们利用静态信息,例如与一天中的时间有关的平均职业,来了解特定的充电站模式。我们用反映先前充电站占用情况的动态信息和时间信息(如白天和工作日)补充此类静态数据。我们的模型有效地融合了动态和静态信息,以便于准确预测。我们在包含593个德国充电站(涵盖2020年8月至2020年12月)的真实数据集上评估了所提出的模型。我们的实验表明,DFDS在F1分数上平均比基线高3.45个百分点。 摘要:With an increasing number of electric vehicles, the accurate forecasting of charging station occupation is crucial to enable reliable vehicle charging. This paper introduces a novel Deep Fusion of Dynamic and Static Information model (DFDS) to effectively forecast the charging station occupation. We exploit static information, such as the mean occupation concerning the time of day, to learn the specific charging station patterns. We supplement such static data with dynamic information reflecting the preceding charging station occupation and temporal information such as daytime and weekday. Our model efficiently fuses dynamic and static information to facilitate accurate forecasting. We evaluate the proposed model on a real-world dataset containing 593 charging stations in Germany, covering August 2020 to December 2020. Our experiments demonstrate that DFDS outperforms the baselines by 3.45 percent points in F1-score on average.

【2】 Improving callsign recognition with air-surveillance data in air-traffic communication 标题:空中交通通信中利用空中监视数据改进呼号识别 链接:https://arxiv.org/abs/2108.12156

作者:Iuliia Nigmatulina,Rudolf Braun,Juan Zuluaga-Gomez,Petr Motlicek 机构:Idiap Research Institute, Martigny, Switzerland, Institute of Computational Linguistics, University of Z¨urich, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland 备注:Submitted to Interspeech 2021 摘要:自动语音识别(ASR)可以作为飞行员和空中交通管制员之间语音通信的辅助手段。它的应用可以显著降低任务的复杂性,提高传输信息的可靠性。显然,需要高精度的预测来最小化错误风险。特别是,在识别用于导航飞行员的关键信息(如命令和呼号)时,需要高精度。我们的研究结果证明,当每个话语中可能的呼号n-gram的权重降低时,包含呼号的监控数据有助于显著提高对话语中呼号的识别。在本文中,我们研究了两种方法:(1)G-boosting,即在语言模型级别(G)调整呼号权重,然后使用动态合成的动态解码器;(2)格重排序,即在使用传统解码器生成的格上引入呼号信息。通过两种方法的结合提高呼号n-gram,我们可以在呼号识别准确率方面获得28.4%的绝对改善,在呼号识别WER方面获得74.2%的相对改善。 摘要:Automatic Speech Recognition (ASR) can be used as the assistance of speech communication between pilots and air-traffic controllers. Its application can significantly reduce the complexity of the task and increase the reliability of transmitted information. Evidently, high accuracy predictions are needed to minimize the risk of errors. Especially, high accuracy is required in recognition of key information, such as commands and callsigns, used to navigate pilots. Our results prove that the surveillance data containing callsigns can help to considerably improve the recognition of a callsign in an utterance when the weights of probable callsign n-grams are reduced per utterance. In this paper, we investigate two approaches: (1) G-boosting, when callsigns weights are adjusted at language model level (G) and followed by the dynamic decoder with an on-the-fly composition, and (2) lattice rescoring when callsign information is introduced on top of lattices generated using a conventional decoder. Boosting callsign n-grams with the combination of two methods allowed us to gain 28.4% of absolute improvement in callsign recognition accuracy and up to 74.2% of relative improvement in WER of callsign recognition.

【3】 Densely-Populated Traffic Detection using YOLOv5 and Non-Maximum Suppression Ensembling 标题:基于YOLOv5和非最大抑制集成的密集交通检测 链接:https://arxiv.org/abs/2108.12118

作者:Raian Rahman,Zadid Bin Azad,Md. Bakhtiar Hasan 机构:Department of Computer Science and Engineering, Islamic University of Technology 备注:13 pages, 4 figures, conference: International Conference on Big Data, IoT and Machine Learning 2021 (BIM 2021) 摘要:车辆目标检测是任何智能交通系统的核心。这对于城市交通管理至关重要。R-CNN、Fast R-CNN、Faster R-CNN和YLO是早期最先进的模型之一。基于区域的CNN方法存在推理时间长的问题,这使得实时使用该模型变得不现实。另一方面,YOLO努力检测成组出现的小物体。在本文中,我们提出了一种方法,可以定位和分类车辆对象从一个给定的密集拥挤的图像使用YOLOv5。YOLO的缺点在我集成了4种不同的模型后得到了解决。我们提出的模型在白天和晚上从街道的俯视图和侧视图拍摄的图像上都表现良好。我们提出的模型的性能是在达卡AI数据集上测量的,该数据集包含密集拥挤的车辆图像。我们的实验表明,我们的模型达到了预期的效果mAP@0.50.458,推断时间为0.75秒,在性能上优于其他最先进的模型。因此,该模型可以在街道上实现实时交通检测,用于交通控制和数据采集。 摘要:Vehicular object detection is the heart of any intelligent traffic system. It is essential for urban traffic management. R-CNN, Fast R-CNN, Faster R-CNN and YOLO were some of the earlier state-of-the-art models. Region based CNN methods have the problem of higher inference time which makes it unrealistic to use the model in real-time. YOLO on the other hand struggles to detect small objects that appear in groups. In this paper, we propose a method that can locate and classify vehicular objects from a given densely crowded image using YOLOv5. The shortcoming of YOLO was solved my ensembling 4 different models. Our proposed model performs well on images taken from both top view and side view of the street in both day and night. The performance of our proposed model was measured on Dhaka AI dataset which contains densely crowded vehicular images. Our experiment shows that our model achieved mAP@0.5 of 0.458 with inference time of 0.75 sec which outperforms other state-of-the-art models on performance. Hence, the model can be implemented in the street for real-time traffic detection which can be used for traffic control and data collection.

【4】 Identification of Vehicle Dynamics Parameters Using Simulation-based Inference 标题:基于仿真推理的车辆动力学参数辨识 链接:https://arxiv.org/abs/2108.12114

作者:Ali Boyali,Simon Thompson,David Robert Wong 备注:Presented at the Autoware Workshop of IEEE Intelligent Vehicle Symposium IV2021 摘要:识别轮胎和车辆参数是设计自主车辆控制和规划算法的关键步骤。本文提出了一种新的参数辨识方法:基于仿真的推理(SBI),这是对近似贝叶斯计算方法(ABC)的现代解释。基于仿真的推理是机器学习文献中的一种新兴方法,已被证明能对复杂问题中的许多参数集产生准确的结果。我们在本文中证明了它可以处理高度非线性车辆动力学参数的识别,并给出了控制方程参数的精确估计。 摘要:Identifying tire and vehicle parameters is an essential step in designing control and planning algorithms for autonomous vehicles. This paper proposes a new method: Simulation-Based Inference (SBI), a modern interpretation of Approximate Bayesian Computation methods (ABC) for parameter identification. The simulation-based inference is an emerging method in the machine learning literature and has proven to yield accurate results for many parameter sets in complex problems. We demonstrate in this paper that it can handle the identification of highly nonlinear vehicle dynamics parameters and gives accurate estimates of the parameters for the governing equations.

【5】 DeepFlow: Abnormal Traffic Flow Detection Using Siamese Networks 标题:DeepFlow:基于暹罗网络的异常流量检测 链接:https://arxiv.org/abs/2108.12016

作者:Sepehr Sabour,Sanjeev Rao,Majid Ghaderi 机构:Department of Computer Science, University of Calgary 备注:7 pages, 12 figures, 3 tables 摘要:如今,许多城市都配备了监控系统和交通控制中心,以监控车辆交通,确保道路安全和效率。监控过程大多是手动完成的,效率低且成本高。近年来,文献中提出了几种数据驱动的解决方案,使用机器学习技术自动分析交通流数据。然而,现有的解决方案需要用于训练的大型综合数据集,而这些数据集并不容易获得,因此限制了它们的应用。在本文中,我们开发了一个基于暹罗神经网络的交通异常检测系统,称为DeepFlow,适用于只有小数据集可供训练的场景。我们的模型可以通过分析车队中车辆的轨迹数据来检测异常交通流。为了评估DeepFlow,我们在SUMO中使用了真实的车辆交通模拟。我们的结果表明,DeepFlow检测异常流量模式的F1分数为78%,同时优于其他现有方法,包括:动态时间扭曲(DTW)、全局对齐内核(GAK)和iForest。 摘要:Nowadays, many cities are equipped with surveillance systems and traffic control centers to monitor vehicular traffic for road safety and efficiency. The monitoring process is mostly done manually which is inefficient and expensive. In recent years, several data-driven solutions have been proposed in the literature to automatically analyze traffic flow data using machine learning techniques. However, existing solutions require large and comprehensive datasets for training which are not readily available, thus limiting their application. In this paper, we develop a traffic anomaly detection system, referred to as DeepFlow, based on Siamese neural networks, which are suitable in scenarios where only small datasets are available for training. Our model can detect abnormal traffic flows by analyzing the trajectory data collected from the vehicles in a fleet. To evaluate DeepFlow, we use realistic vehicular traffic simulations in SUMO. Our results show that DeepFlow detects abnormal traffic patterns with an F1 score of 78%, while outperforming other existing approaches including: Dynamic Time Warping (DTW), Global Alignment Kernels (GAK), and iForest.

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 A Pedestrian Detection and Tracking Framework for Autonomous Cars: Efficient Fusion of Camera and LiDAR Data 标题:一种自动驾驶汽车行人检测与跟踪框架:摄像机与LiDAR数据的有效融合 链接:https://arxiv.org/abs/2108.12375

作者:Muhammad Mobaidul Islam,Abdullah Al Redwan Newaz,Ali Karimoddini 机构: Karimoddini are with theDepartment of Electrical and Computer Engineering, North Carolina A&TState University 摘要:提出了一种融合摄像机和激光雷达传感器数据的行人检测与跟踪新方法。为了应对与自主驾驶场景相关的挑战,提出了一个集成的跟踪和检测框架。检测阶段通过将激光雷达流转换为计算可处理的深度图像来执行,然后,开发深度神经网络来识别RGB和深度图像中的行人候选。为了提供准确的信息,通过使用卡尔曼滤波器融合多模态传感器信息,进一步增强了检测阶段。跟踪阶段是卡尔曼滤波预测和光流算法的组合,用于跟踪场景中的多个行人。我们在真实的公共驾驶数据集上评估我们的框架。实验结果表明,与仅使用基于图像的行人检测的基线方法相比,该方法实现了显著的性能改进。 摘要:This paper presents a novel method for pedestrian detection and tracking by fusing camera and LiDAR sensor data. To deal with the challenges associated with the autonomous driving scenarios, an integrated tracking and detection framework is proposed. The detection phase is performed by converting LiDAR streams to computationally tractable depth images, and then, a deep neural network is developed to identify pedestrian candidates both in RGB and depth images. To provide accurate information, the detection phase is further enhanced by fusing multi-modal sensor information using the Kalman filter. The tracking phase is a combination of the Kalman filter prediction and an optical flow algorithm to track multiple pedestrians in a scene. We evaluate our framework on a real public driving dataset. Experimental results demonstrate that the proposed method achieves significant performance improvement over a baseline method that solely uses image-based pedestrian detection.

推理|分析|理解|解释(3篇)

【1】 FAST-PCA: A Fast and Exact Algorithm for Distributed Principal Component Analysis 标题:FAST-PCA:一种快速准确的分布式主成分分析算法 链接:https://arxiv.org/abs/2108.12373

作者:Arpita Gang,Waheed U. Bajwa 机构: AG and WUB are with the Department of Electricaland Computer Engineering, Rutgers University–New Brunswick 备注:28 pages; preprint of a journal submission 摘要:主成分分析(PCA)是机器学习领域中一种基本的数据预处理工具。虽然主成分分析通常简化为降维,但主成分分析的目的实际上有两个:降维和特征学习。此外,现代数据集中的维度和样本量的巨大性使得集中式PCA解决方案无法使用。在这种情况下,本文重新考虑了当数据样本分布在任意连接网络的节点上时的主成分分析问题。虽然存在一些分布式PCA解决方案,但这些解决方案要么忽略了特征学习的目的,要么通信开销使其效率低下,要么缺乏精确的收敛保证。为了解决上述问题,本文提出了一种分布式PCA算法FAST-PCA(快速精确分布式PCA)。所提出的算法在通信方面是有效的,并且可以被证明线性和精确地收敛到导致降维和不相关特征的主成分。实验结果进一步证实了我们的观点。 摘要:Principal Component Analysis (PCA) is a fundamental data preprocessing tool in the world of machine learning. While PCA is often reduced to dimension reduction, the purpose of PCA is actually two-fold: dimension reduction and feature learning. Furthermore, the enormity of the dimensions and sample size in the modern day datasets have rendered the centralized PCA solutions unusable. In that vein, this paper reconsiders the problem of PCA when data samples are distributed across nodes in an arbitrarily connected network. While a few solutions for distributed PCA exist those either overlook the feature learning part of the purpose, have communication overhead making them inefficient and/or lack exact convergence guarantees. To combat these aforementioned issues, this paper proposes a distributed PCA algorithm called FAST-PCA (Fast and exAct diSTributed PCA). The proposed algorithm is efficient in terms of communication and can be proved to converge linearly and exactly to the principal components that lead to dimension reduction as well as uncorrelated features. Our claims are further supported by experimental results.

【2】 Active Inference for Stochastic Control 标题:随机控制的主动推理 链接:https://arxiv.org/abs/2108.12245

作者:Aswin Paul,Noor Sajid,Manoj Gopalkrishnan,Adeel Razi 机构: IITB-Monash Research Academy, Mumbai, India, Department of Electrical Engineering, IIT Bombay, Mumbai, India, Turner Institute for Brain and Mental Health, Monash University, Australia, Wellcome Trust Centre for Human Neuroimaging, UCL, United Kingdom 备注:12 pages, 5 figures, Accepted presentation at IWAI-2021 (ECML-PKDD) 摘要:鉴于其直观(概率)的形式主义,主动推理已成为控制问题的另一种方法。然而,尽管它在理论上很有用,但计算实现在很大程度上局限于低维、确定性设置。本文强调,这是由于无法充分模拟随机过渡动态,特别是在规划期间必须评估广泛的政策(即行动轨迹)空间时。幸运的是,最近的进展提出了一种针对有限时间范围的改进规划算法。我们在此工作的基础上评估了随机控制环境下主动推理的效用。为此,我们模拟了具有额外复杂性的经典windy grid world任务,即:1)环境随机性;2) 学习过渡动力学;3)部分可观测性。我们的结果证明了在确定性和随机环境下,与强化学习相比,使用主动推理的优势。 摘要:Active inference has emerged as an alternative approach to control problems given its intuitive (probabilistic) formalism. However, despite its theoretical utility, computational implementations have largely been restricted to low-dimensional, deterministic settings. This paper highlights that this is a consequence of the inability to adequately model stochastic transition dynamics, particularly when an extensive policy (i.e., action trajectory) space must be evaluated during planning. Fortunately, recent advancements propose a modified planning algorithm for finite temporal horizons. We build upon this work to assess the utility of active inference for a stochastic control setting. For this, we simulate the classic windy grid-world task with additional complexities, namely: 1) environment stochasticity; 2) learning of transition dynamics; and 3) partial observability. Our results demonstrate the advantage of using active inference, compared to reinforcement learning, in both deterministic and stochastic settings.

【3】 Understanding the Logit Distributions of Adversarially-Trained Deep Neural Networks 标题:理解对抗性训练的深度神经网络的Logit分布 链接:https://arxiv.org/abs/2108.12001

作者:Landan Seguin,Anthony Ndirango,Neeli Mishra,SueYeon Chung,Tyler Lee 机构:Intel Labs, Columbia University 备注:29 pages (13 main, 16 supplemental), 22 figures (5 main, 17 supplemental) 摘要:对抗性防御训练深层神经网络对来自对抗性攻击的输入扰动保持不变。几乎所有的防御策略都是通过对抗性训练来实现这种不变性的,即在对抗性干扰下对输入进行训练。尽管对抗性训练在缓解对抗性攻击方面取得了成功,但对抗性训练(at)模型和标准模型之间的行为差异仍不清楚。受最近通过提取AT模型对无输入扰动的学习鲁棒性进行研究的启发,我们通过分析AT模型中Logit的分布来探索对抗性训练中学习的内容。我们确定了学习对抗鲁棒性所必需的三个logit特征。首先,我们为以下发现提供了理论依据:对抗性训练缩小了logit分布的两个重要特征:AT模型的最大logit值和“logit差距”(logit最大值和下一个最大值之间的差异)平均较低。其次,我们证明了AT和标准模型在高置信度和低置信度样本上存在显著差异,然后通过可视化置信度差异最大的样本来说明明显的质量差异。最后,我们发现关于错误类的学习信息对于学习稳健性至关重要,方法是在蒸馏过程中操纵非最大logit信息并测量对学生稳健性的影响。我们的结果表明,在没有输入扰动的情况下学习一些对抗性稳健性需要一个模型来学习特定的样本信心和遵循复杂分布的错误类顺序。 摘要:Adversarial defenses train deep neural networks to be invariant to the input perturbations from adversarial attacks. Almost all defense strategies achieve this invariance through adversarial training i.e. training on inputs with adversarial perturbations. Although adversarial training is successful at mitigating adversarial attacks, the behavioral differences between adversarially-trained (AT) models and standard models are still poorly understood. Motivated by a recent study on learning robustness without input perturbations by distilling an AT model, we explore what is learned during adversarial training by analyzing the distribution of logits in AT models. We identify three logit characteristics essential to learning adversarial robustness. First, we provide a theoretical justification for the finding that adversarial training shrinks two important characteristics of the logit distribution: the max logit values and the "logit gaps" (difference between the logit max and next largest values) are on average lower for AT models. Second, we show that AT and standard models differ significantly on which samples are high or low confidence, then illustrate clear qualitative differences by visualizing samples with the largest confidence difference. Finally, we find learning information about incorrect classes to be essential to learning robustness by manipulating the non-max logit information during distillation and measuring the impact on the student's robustness. Our results indicate that learning some adversarial robustness without input perturbations requires a model to learn specific sample-wise confidences and incorrect class orderings that follow complex distributions.

检测相关(3篇)

【1】 Man versus Machine: AutoML and Human Experts' Role in Phishing Detection 标题:人与机器:AutoML和人类专家在网络钓鱼检测中的作用 链接:https://arxiv.org/abs/2108.12193

作者:Rizka Purwanto,Arindam Pal,Alan Blair,Sanjay Jha 机构:University of New South Wales, Australia, Data, CSIRO, Sydney, Australia, Cyber Security Cooperative Research Centre, Australia, This work has been submitted to the IEEE for possible, after 摘要:机器学习(ML)在过去几年中发展迅速,并已成功地用于广泛的任务,包括网络钓鱼检测。然而,建立一个有效的基于ML的检测系统并不是一项简单的任务,需要具有相关领域知识的数据科学家。自动化机器学习(AutoML)框架近年来受到了广泛关注,使非ML专家能够构建机器学习模型。这就引出了一个有趣的问题:AutoML是否能超越人类数据科学家所取得的成果。我们的论文比较了六个著名的、最先进的AutoML框架在十个不同的钓鱼数据集上的性能,以了解基于AutoML的模型是否能够优于手工构建的机器学习模型。我们的结果表明,在复杂的分类任务中,基于AutoML的模型能够优于手动开发的机器学习模型,特别是在特征不太具有区分性的数据集,以及具有重叠类或相对高度非线性的数据集。由于目前仅支持监督分类问题,因此在使用AutoML框架构建真实世界的钓鱼检测系统方面仍然存在挑战,这导致需要标记数据,并且无法增量更新基于AutoML的模型。这表明,具有网络钓鱼和网络安全领域知识的专家在网络钓鱼检测管道中仍然至关重要。 摘要:Machine learning (ML) has developed rapidly in the past few years and has successfully been utilized for a broad range of tasks, including phishing detection. However, building an effective ML-based detection system is not a trivial task, and requires data scientists with knowledge of the relevant domain. Automated Machine Learning (AutoML) frameworks have received a lot of attention in recent years, enabling non-ML experts in building a machine learning model. This brings to an intriguing question of whether AutoML can outperform the results achieved by human data scientists. Our paper compares the performances of six well-known, state-of-the-art AutoML frameworks on ten different phishing datasets to see whether AutoML-based models can outperform manually crafted machine learning models. Our results indicate that AutoML-based models are able to outperform manually developed machine learning models in complex classification tasks, specifically in datasets where the features are not quite discriminative, and datasets with overlapping classes or relatively high degrees of non-linearity. Challenges also remain in building a real-world phishing detection system using AutoML frameworks due to the current support only on supervised classification problems, leading to the need for labeled data, and the inability to update the AutoML-based models incrementally. This indicates that experts with knowledge in the domain of phishing and cybersecurity are still essential in the loop of the phishing detection pipeline.

【2】 Anomaly Detection of Defect using Energy of Point Pattern Features within Random Finite Set Framework 标题:随机有限集框架下基于点模式特征能量的缺陷异常检测 链接:https://arxiv.org/abs/2108.12159

作者:Ammar Mansoor Kamoona,Amirali Khodadadian Gostar,Alireza Bab-Hadiashar,Reza Hoseinnezhad 机构: Royal Melbourne Institute ofTechnology 备注:to be submitted to TII journal, 17pages 摘要:本文提出了一种基于点模式数据异常检测的工业缺陷检测方法。最新作品使用\textit{global features}进行特征提取,以总结图像内容。但是,全局特征对照明和视点变化不具有鲁棒性,并且不能描述要在制造业中充分利用的图像几何信息。据我们所知,我们是第一个提出使用局部/点模式特征的转移学习来克服这些限制并捕获图像区域的几何信息的人。我们将这些局部/点模式特征建模为随机有限集(RFS)。此外,我们提出了RFS能量,与RFS似然作为异常分数相比。正态样本点模式特征的相似性分布被建模为多元高斯分布。所提出的RFS能量的参数学习不需要任何繁重的计算。我们在MVTec AD数据集(一个多目标缺陷检测数据集)上评估了所提出的方法。实验结果表明,与目前最先进的方法相比,我们提出的方法具有优异的性能,并且所提出的RFS能量在少数镜头学习环境下优于最先进的方法。 摘要:In this paper, we propose an efficient approach for industrial defect detection that is modeled based on anomaly detection using point pattern data. Most recent works use \textit{global features} for feature extraction to summarize image content. However, global features are not robust against lighting and viewpoint changes and do not describe the image's geometrical information to be fully utilized in the manufacturing industry. To the best of our knowledge, we are the first to propose using transfer learning of local/point pattern features to overcome these limitations and capture geometrical information of the image regions. We model these local/point pattern features as a random finite set (RFS). In addition we propose RFS energy, in contrast to RFS likelihood as anomaly score. The similarity distribution of point pattern features of the normal sample has been modeled as a multivariate Gaussian. Parameters learning of the proposed RFS energy does not require any heavy computation. We evaluate the proposed approach on the MVTec AD dataset, a multi-object defect detection dataset. Experimental results show the outstanding performance of our proposed approach compared to the state-of-the-art methods, and the proposed RFS energy outperforms the state-of-the-art in the few shot learning settings.

【3】 Anomaly Detection on IT Operation Series via Online Matrix Profile 标题:基于在线矩阵轮廓的IT运行序列异常检测 链接:https://arxiv.org/abs/2108.12093

作者:Shi-Ying Lan,Run-Qing Chen,Wan-Lei Zhao 机构:Xiamen University, Xiamen, China 备注:10 pages, 6 figures; Shi-Ying Lan and Run-Qing Chen contributed equally 摘要:时间序列异常检测是监控IT系统关键性能指标(KPI)的一项基本任务。文献中现有的方法要么需要大量的训练资源,要么难以在实际场景中部署。在本文中,在线矩阵配置文件,它不需要训练,是为了解决这个问题。通过参考最接近当前子序列的过去子序列来检测异常。基于在线矩阵剖面引入了距离显著性,当异常发生时,该矩阵剖面显示了显著的模式。另一种无需训练的光谱残差方法被集成到我们的方法中,以进一步提高检测精度。此外,通过引入缓存策略,对于长时间序列,该方法的速度至少提高了四倍。与现有方法相比,在线矩阵配置文件在准确性和效率之间进行了很好的权衡。更重要的是,它对各种类型的时间序列都是通用的,因为它不受任何训练模型的约束。 摘要:Anomaly detection on time series is a fundamental task in monitoring the Key Performance Indicators (KPIs) of IT systems. The existing approaches in the literature either require a lot of training resources or are hard to be deployed in real scenarios. In this paper, the online matrix profile, which requires no training, is proposed to address this issue. The anomalies are detected by referring to the past subsequence that is the closest to the current one. The distance significance is introduced based on the online matrix profile, which demonstrates a prominent pattern when an anomaly occurs. Another training-free approach spectral residual is integrated into our approach to further enhance the detection accuracy. Moreover, the proposed approach is sped up by at least four times for long time series by the introduced cache strategy. In comparison to the existing approaches, the online matrix profile makes a good trade-off between accuracy and efficiency. More importantly, it is generic to various types of time series in the sense that it works without the constraint from any trained model.

分类|识别(4篇)

【1】 Application of Classification and Feature Selection in Building Energy Simulations 标题:分类和特征选择在建筑能耗模拟中的应用 链接:https://arxiv.org/abs/2108.12363

作者:Fatemeh Shahsavari,Zohreh Shaghaghian 机构: Texas A&M, College Station, TX, United States 摘要:建筑节能性能是基于性能的建筑设计决策的关键特征之一。建筑围护材料在提高建筑节能性能方面发挥着关键作用。建筑材料的热性能决定了建筑围护结构的传热水平,从而决定了建筑的年度热能性能。本研究应用线性判别分析(LDA)方法研究材料热性能对建筑热负荷的影响。特征选择采用了主成分分析(PCA)和穷举特征选择(EFS)两种方法。为加利福尼亚州洛杉矶的一座办公楼开发了一个假设设计场景,其中包括六种材料替代方案。基于LDA结果选择最佳设计方案,并基于PCA和EFS方法确定关键输入参数。PCA结果证实,在材料的所有热性能中,包括导热系数、密度、比热容量和厚度在内的四个参数是建筑热行为和热能消耗方面最关键的特征。这一结果与大多数建筑能耗模拟工具的假设非常吻合。 摘要:Building energy performance is one of the key features in performance-based building design decision making. Building envelope materials can play a key role in improving building energy performance. The thermal properties of building materials determine the level of heat transfer through building envelope, thus the annual thermal energy performance of the building. This research applies the Linear Discriminant Analysis (LDA) method to study the effects of materials' thermal properties on building thermal loads. Two approaches are adopted for feature selection including the Principal Component Analysis (PCA) and the Exhaustive Feature Selection (EFS). A hypothetical design scenario is developed with six material alternatives for an office building in Los Angeles, California. The best design alternative is selected based on the LDA results and the key input parameters are determined based on the PCA and EFS methods. The PCA results confirm that among all thermal properties of the materials, the four parameters including thermal conductivity, density, specific heat capacity, and thickness are the most critical features, in terms of building thermal behavior and thermal energy consumption. This result matches quite well with the assumptions of most of the building energy simulation tools.

【2】 Grammar Based Identification Of Speaker Role For Improving ATCO And Pilot ASR 标题:基于语法的说话人角色识别改进ATCO和Pilot ASR 链接:https://arxiv.org/abs/2108.12175

作者:Amrutha Prasad,Juan Zuluaga-Gomez,Petr Motlicek,Oliver Ohneiser,Hartmut Helmke,Saeed Sarfjoo,Iuliia Nigmatulina 机构:Idiap Research Institute, Martigny, Switzerland, Brno University of Technology, Brno, Czechia, Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland, German Aerospace Center (DLR), Institute of Flight Guidance, Braunschweig, Germany 备注:Submitted to Interspeech 2021 摘要:用于空中交通管制的基于助手的语音识别(ABSR)通常通过汇集空中交通管制员(ATCO)和飞行员数据进行训练。实际上,这是因为与ATCO相比,飞行员数据的比例较小,而他们的标准通信语言相似。然而,由于ATCO和飞行员的数据不平衡以及他们不同的声学条件,ATCO的ASR性能通常明显优于飞行员。在本文中,我们建议(1)使用ATR笔录的自动方法分割ATCO和导频数据,和(2)考虑阿特科和导频ASR作为声学模型(AM)训练的两个单独任务。对于ATCO和飞行员数据的说话人角色分类,使用种子模型生成假设的ASR转录本,然后根据从国际民用航空组织(ICAO)定义的语法中提取的知识对说话人角色进行分类。这种方法为ATCO和pilot提供了83%的平均说话人角色识别准确率。最后,我们表明,与通过汇集所有数据进行的AM训练相比,针对每个任务单独训练AM或使用多任务方法非常适合此数据。 摘要:Assistant Based Speech Recognition (ABSR) for air traffic control is generally trained by pooling both Air Traffic Controller (ATCO) and pilot data. In practice, this is motivated by the fact that the proportion of pilot data is lesser compared to ATCO while their standard language of communication is similar. However, due to data imbalance of ATCO and pilot and their varying acoustic conditions, the ASR performance is usually significantly better for ATCOs than pilots. In this paper, we propose to (1) split the ATCO and pilot data using an automatic approach exploiting ASR transcripts, and (2) consider ATCO and pilot ASR as two separate tasks for Acoustic Model (AM) training. For speaker role classification of ATCO and pilot data, a hypothesized ASR transcript is generated with a seed model, subsequently used to classify the speaker role based on the knowledge extracted from grammar defined by International Civil Aviation Organization (ICAO). This approach provides an average speaker role identification accuracy of 83% for ATCO and pilot. Finally, we show that training AMs separately for each task, or using a multitask approach is well suited for this data compared to AM trained by pooling all data.

【3】 4-bit Quantization of LSTM-based Speech Recognition Models 标题:基于LSTM的语音识别模型的4位量化 链接:https://arxiv.org/abs/2108.12074

作者:Andrea Fasoli,Chia-Yu Chen,Mauricio Serrano,Xiao Sun,Naigang Wang,Swagath Venkataramani,George Saon,Xiaodong Cui,Brian Kingsbury,Wei Zhang,Zoltán Tüske,Kailash Gopalakrishnan 机构:IBM Research, USA, †These authors contributed equally to this work 备注:5 pages, 3 figures, Andrea Fasoli and Chia-Yu Chen equally contributed to this work. Paper accepted to Interspeech 2021 摘要:我们研究了两大类基于LSTM的自动语音识别(ASR)体系结构:混合深层双向LSTM-隐马尔可夫模型(DBLSTM-HMMs)和递归神经网络-传感器(RNN-Ts)中权重和激活的积极低精度表示的影响。使用4位整数表示,应用于这些模型的LSTM部分的自然量化方法会导致显著的字错误率(WER)下降。另一方面,我们表明,通过适当选择量化器和初始化,可以实现最小的精度损失。特别是,我们根据网络的局部特性定制量化方案,在限制计算时间的同时提高识别性能。我们在NIST Hub5-2000评估的交换机(SWB)和呼叫中心(CH)测试集上演示了我们的解决方案。使用300或2000小时SWB数据训练的DBLSTM HMMs分别达到$<$0.5%和$<$1%的平均WER降级。在更具挑战性的RNN-T模型上,我们的量化策略将4位推断的退化限制在1.3%。 摘要:We investigate the impact of aggressive low-precision representations of weights and activations in two families of large LSTM-based architectures for Automatic Speech Recognition (ASR): hybrid Deep Bidirectional LSTM - Hidden Markov Models (DBLSTM-HMMs) and Recurrent Neural Network - Transducers (RNN-Ts). Using a 4-bit integer representation, a na\"ive quantization approach applied to the LSTM portion of these models results in significant Word Error Rate (WER) degradation. On the other hand, we show that minimal accuracy loss is achievable with an appropriate choice of quantizers and initializations. In particular, we customize quantization schemes depending on the local properties of the network, improving recognition performance while limiting computational time. We demonstrate our solution on the Switchboard (SWB) and CallHome (CH) test sets of the NIST Hub5-2000 evaluation. DBLSTM-HMMs trained with 300 or 2000 hours of SWB data achieves $<$0.5% and $<$1% average WER degradation, respectively. On the more challenging RNN-T models, our quantization strategy limits degradation in 4-bit inference to 1.3%.

【4】 Classification of Emotions and Evaluation of Customer Satisfaction from Speech in Real World Acoustic Environments 标题:真实声学环境中的语音情感分类与顾客满意度评价 链接:https://arxiv.org/abs/2108.11981

作者:Luis Felipe Parra-Gallego,Juan Rafael Orozco-Arroyave 机构:University of Antioquia UdeA, Medell´ın, Colombia., Konecta Group S.A.S. Medell´ın, Colombia., Pattern Recognition Lab. Friedrich Alexander University, Erlangen-Nuremberg 摘要:本文致力于寻找合适的特征,以便在真实声学场景中从语音中稳健地识别情感并评估客户满意度。情感的分类基于标准和知名语料库,客户满意度的评估基于客户在与呼叫中心代理打电话时对所接收服务的真实意见的记录。本研究中考虑的特征集包括两种说话人模型,即x向量和i向量,以及Interspeech 2010副语言学挑战赛(I2010PC)中引入的著名特征集。此外,我们还介绍了使用DisVoice框架提取的发音、发音和韵律特征作为备选特征集,从语音中稳健地建模情感和客户满意度。结果表明,I2010PC功能集是在文献中典型使用的标准数据库中对情绪进行分类的最佳方法。当考虑在呼叫中心收集的录音时,在没有任何声学条件控制的情况下,使用我们的发音功能可以获得最佳效果。I2010PC功能集包括1584个测量值,而铰接方法仅包括488个测量值。我们认为,所提出的方法更适合于声学条件不受控制的实际应用,并且可能更方便于工业应用。 摘要:This paper focuses on finding suitable features to robustly recognize emotions and evaluate customer satisfaction from speech in real acoustic scenarios. The classification of emotions is based on standard and well-known corpora and the evaluation of customer satisfaction is based on recordings of real opinions given by customers about the received service during phone calls with call-center agents. The feature sets considered in this study include two speaker models, namely x-vectors and i-vectors, and also the well known feature set introduced in the Interspeech 2010 Paralinguistics Challenge (I2010PC). Additionally, we introduce the use of phonation, articulation and prosody features extracted with the DisVoice framework as alternative feature sets to robustly model emotions and customer satisfaction from speech. The results indicate that the I2010PC feature set is the best approach to classify emotions in the standard databases typically used in the literature. When considering the recordings collected in the call-center, without any control over the acoustic conditions, the best results are obtained with our articulation features. The I2010PC feature set includes 1584 measures while the articulation approach only includes 488 measures. We think that the proposed approach is more suitable for real-world applications where the acoustic conditions are not controlled and also it is potentially more convenient for industrial applications.

表征(1篇)

【1】 Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies 标题:语言技术中性别排他性的危害和非二进制表示的挑战 链接:https://arxiv.org/abs/2108.12084

作者:Sunipa Dev,Masoud Monajatipoor,Anaelia Ovalle,Arjun Subramonian,Jeff M Phillips,Kai-Wei Chang 机构:sheher, UCLA, hehim, theyheshe, theythem, University of Utah 备注:None 摘要:性别问题在语言任务和研究语言模型传播的陈规定型观念时得到广泛讨论。然而,目前的讨论主要将性别视为二元性,这可能会造成伤害,如非二元性身份的周期性抹杀。这些危害是由模型和数据集偏见造成的,这些偏见是社会中对非二元性别不认识和缺乏理解的后果。在本文中,我们解释了性别和语言的复杂性,并调查了非二元性的人,以了解在英语语言技术中将性别视为二元性所带来的危害。我们还详细介绍了当前的语言表征(如手套、BERT)是如何捕捉和延续这些危害和相关挑战的,这些危害和挑战需要得到承认和解决,以便表征公平地编码性别信息。 摘要:Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.

编码器(1篇)

【1】 Investigation of Nonlinear Model Order Reduction of the Quasigeostrophic Equations through a Physics-Informed Convolutional Autoencoder 标题:基于物理信息的卷积自动编码器对准地转方程非线性模型降阶的研究 链接:https://arxiv.org/abs/2108.12344

作者:Rachel Cooper,Andrey A. Popov,Adrian Sandu 机构:Computational Science Laboratory Report, CSL-TR-,-, “Compute the Future!”, Department of Computer Science, Virginia Tech, Blacksburg, VA , Phone: (,) ,-, Fax: (,) ,- 摘要:降阶建模(ROM)是一个技术领域,它通过廉价的替代物,以较少的自由度捕捉重要的动力学特性,来近似真实世界过程的复杂物理模型。传统的ROM技术,如适当正交分解(POD),侧重于动力学的线性投影到一组光谱特征上。在这篇文章中,我们探讨了使用自动编码器(AE)构造ROM,自动编码器将系统动力学非线性投影到从数据中学习到的低维流形上。该方法使用卷积神经网络(CNN)来学习空间特征,而不是光谱特征,并利用物理信息(PI)代价函数来捕获时间特征。我们使用准地转方程的研究表明,虽然PI代价函数有助于空间重建,但空间特征不如光谱特征强大,并且通过基于机器学习的方法构建ROM需要对新的非标准方法进行重大研究。 摘要:Reduced order modeling (ROM) is a field of techniques that approximates complex physics-based models of real-world processes by inexpensive surrogates that capture important dynamical characteristics with a smaller number of degrees of freedom. Traditional ROM techniques such as proper orthogonal decomposition (POD) focus on linear projections of the dynamics onto a set of spectral features. In this paper we explore the construction of ROM using autoencoders (AE) that perform nonlinear projections of the system dynamics onto a low dimensional manifold learned from data. The approach uses convolutional neural networks (CNN) to learn spatial features as opposed to spectral, and utilize a physics informed (PI) cost function in order to capture temporal features as well. Our investigation using the quasi-geostrophic equations reveals that while the PI cost function helps with spatial reconstruction, spatial features are less powerful than spectral features, and that construction of ROMs through machine learning-based methods requires significant investigation into novel non-standard methodologies.

优化|敛散性(1篇)

【1】 Provable Tensor-Train Format Tensor Completion by Riemannian Optimization 标题:基于黎曼优化的可证明张量-训练格式张量完备化 链接:https://arxiv.org/abs/2108.12163

作者:Jian-Feng Cai,Jingyang Li,Dong Xia 机构:Hong Kong University of Science and Technology 备注:71 pages, 5 figures 摘要:张量列(TT)格式在处理结构高阶张量方面具有诱人的优势。近十年来,TT格式张量在不同学科中得到了广泛的应用,其中张量补全引起了广泛的关注。许多快速算法,包括黎曼梯度下降(RGrad)算法,已经被提出用于TT格式的张量补全。然而,这些算法的理论保证大多缺失或次优,部分原因是TT格式分解中复杂的递归代数运算。此外,为其他格式的张量(例如Tucker和CP)建立的现有结果不适用,因为处理TT格式张量的算法有很大的不同,并且涉及更多。在本文中,我们提供了,据我们所知,第一个理论保证的收敛RGrad算法的TT格式张量完成,在一个近乎最佳的样本量条件。RGrad算法以无张量条件数的恒定收缩率线性收敛,无需重新调节。我们还提出了一种新的方法,称为序列二阶矩方法,以在相似的样本量要求下实现热初始化。作为一个副产品,我们的结果甚至大大改进了先前关于矩阵补全的RGrad算法的研究。数值实验证实了我们的理论发现,并展示了TT格式分解所获得的计算加速。 摘要:The tensor train (TT) format enjoys appealing advantages in handling structural high-order tensors. The recent decade has witnessed the wide applications of TT-format tensors from diverse disciplines, among which tensor completion has drawn considerable attention. Numerous fast algorithms, including the Riemannian gradient descent (RGrad) algorithm, have been proposed for the TT-format tensor completion. However, the theoretical guarantees of these algorithms are largely missing or sub-optimal, partly due to the complicated and recursive algebraic operations in TT-format decomposition. Moreover, existing results established for the tensors of other formats, for example, Tucker and CP, are inapplicable because the algorithms treating TT-format tensors are substantially different and more involved. In this paper, we provide, to our best knowledge, the first theoretical guarantees of the convergence of RGrad algorithm for TT-format tensor completion, under a nearly optimal sample size condition. The RGrad algorithm converges linearly with a constant contraction rate that is free of tensor condition number without the necessity of re-conditioning. We also propose a novel approach, referred to as the sequential second-order moment method, to attain a warm initialization under a similar sample size requirement. As a byproduct, our result even significantly refines the prior investigation of RGrad algorithm for matrix completion. Numerical experiments confirm our theoretical discovery and showcase the computational speedup gained by the TT-format decomposition.

预测|估计(2篇)

【1】 Parallel Machine Learning for Forecasting the Dynamics of Complex Networks 标题:用于预测复杂网络动力学的并行机器学习 链接:https://arxiv.org/abs/2108.12129

作者:Keshav Srinivasan,Nolan Coble,Joy Hamlin,Thomas Antonsen,Edward Ott,Michelle Girvan 机构:University of Maryland, College Park, Maryland , SUNY Brockport, New York , USA., Stony Brook University, New York , USA 摘要:从以前的时间序列数据预测大型复杂网络的动态在广泛的环境中非常重要。在这里,我们提出了一个机器学习方案,使用一个模拟感兴趣的网络拓扑结构的并行架构来完成这项任务。我们证明了我们的方法的实用性和可扩展性,该方法是在混沌振荡器网络上使用水库计算实现的。考虑两个层次的先验知识:(i)已知网络链路;和(ii)网络链路未知,并通过数据驱动方法推断,以近似优化预测。 摘要:Forecasting the dynamics of large complex networks from previous time-series data is important in a wide range of contexts. Here we present a machine learning scheme for this task using a parallel architecture that mimics the topology of the network of interest. We demonstrate the utility and scalability of our method implemented using reservoir computing on a chaotic network of oscillators. Two levels of prior knowledge are considered: (i) the network links are known; and (ii) the network links are unknown and inferred via a data-driven approach to approximately optimize prediction.

【2】 A comparison of approaches to improve worst-case predictive model performance over patient subpopulations 标题:改善患者亚群最坏情况预测模型性能的方法比较 链接:https://arxiv.org/abs/2108.12250

作者:Stephen R. Pfohl,Haoran Zhang,Yizhe Xu,Agata Foryciarz,Marzyeh Ghassemi,Nigam H. Shah 机构:Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, Department of Computer Science, University of Toronto, Toronto, Ontario, Canada, Department of Computer Science, Stanford University, Stanford, California , USA 摘要:在患者群体中平均准确的临床结果预测模型可能在某些亚群体中表现不佳,可能导致或加剧医疗服务获得和质量方面的不平等。旨在最大化子种群最坏情况下模型性能的模型训练方法,如分布式鲁棒优化(DRO),试图在不引入额外危害的情况下解决该问题。我们对DRO和标准学习程序的几种变体进行了大规模实证研究,以确定模型开发和选择方法,与从电子健康记录数据学习预测模型的标准方法相比,这些方法能够持续改善亚群体的分类和最坏情况表现。在我们的评估过程中,我们引入了DRO方法的扩展,该扩展允许指定用于评估最坏情况性能的指标。我们对预测住院死亡率、住院时间延长和30天再入院的模型进行分析,并使用重症监护数据预测住院死亡率。我们发现,除了相对较少的例外,对于所检查的每个患者亚群,没有任何方法比使用整个训练数据集的标准学习程序表现得更好。这些结果表明,当有兴趣改善患者亚群的模型性能,使其超出标准实践所能达到的水平时,可能需要通过隐式或显式增加有效样本量的技术来实现。 摘要:Predictive models for clinical outcomes that are accurate on average in a patient population may underperform drastically for some subpopulations, potentially introducing or reinforcing inequities in care access and quality. Model training approaches that aim to maximize worst-case model performance across subpopulations, such as distributionally robust optimization (DRO), attempt to address this problem without introducing additional harms. We conduct a large-scale empirical study of DRO and several variations of standard learning procedures to identify approaches for model development and selection that consistently improve disaggregated and worst-case performance over subpopulations compared to standard approaches for learning predictive models from electronic health records data. In the course of our evaluation, we introduce an extension to DRO approaches that allows for specification of the metric used to assess worst-case performance. We conduct the analysis for models that predict in-hospital mortality, prolonged length of stay, and 30-day readmission for inpatient admissions, and predict in-hospital mortality using intensive care data. We find that, with relatively few exceptions, no approach performs better, for each patient subpopulation examined, than standard learning procedures using the entire training dataset. These results imply that when it is of interest to improve model performance for patient subpopulations beyond what can be achieved with standard practices, it may be necessary to do so via techniques that implicitly or explicitly increase the effective sample size.

其他神经网络|深度学习|模型|建模(11篇)

【1】 CAPE: Context-Aware Private Embeddings for Private Language Learning 标题:CAPE:面向私人语言学习的上下文感知私人嵌入 链接:https://arxiv.org/abs/2108.12318

作者:Richard Plant,Dimitra Gkatzia,Valerio Giuffrida 机构:Edinburgh Napier University 备注:Accepted into EMNLP21 main conference 摘要:基于深度学习的语言模型在许多应用中取得了最先进的成果,包括情感分析、主题标记、意图分类等。使用这些模型获取文本表示或嵌入,就有可能对从语言和上下文线索中获得的个人识别信息进行编码,这可能会对声誉或隐私造成风险。为了改善这些问题,我们提出了上下文感知私有嵌入(CAPE),这是一种在嵌入训练期间保护隐私的新方法。为了维护文本表示的隐私,CAPE通过差异隐私应用校准噪声,在隐藏敏感信息的同时保留编码语义链接。此外,CAPE采用了一种对抗性训练制度,该制度掩盖了已确定的私人变量。实验结果表明,该方法比单次干预都能更好地减少私有信息泄漏。 摘要:Deep learning-based language models have achieved state-of-the-art results in a number of applications including sentiment analysis, topic labelling, intent classification and others. Obtaining text representations or embeddings using these models presents the possibility of encoding personally identifiable information learned from language and context cues that may present a risk to reputation or privacy. To ameliorate these issues, we propose Context-Aware Private Embeddings (CAPE), a novel approach which preserves privacy during training of embeddings. To maintain the privacy of text representations, CAPE applies calibrated noise through differential privacy, preserving the encoded semantic links while obscuring sensitive information. In addition, CAPE employs an adversarial training regime that obscures identified private variables. Experimental results demonstrate that the proposed approach reduces private information leakage better than either single intervention.

【2】 Lifelong Infinite Mixture Model Based on Knowledge-Driven Dirichlet Process 标题:基于知识驱动Dirichlet过程的终身无限混合模型 链接:https://arxiv.org/abs/2108.12278

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted by International Conference on Computer Vision (ICCV 2021) 摘要:最近在终身学习方面的研究工作提出了一种混合模式,以适应越来越多的任务。所提出的方法在克服灾难性遗忘方面显示了良好的效果。然而,这些成功模式背后的理论仍然没有得到很好的理解。在本文中,我们通过基于模型生成的数据的概率表示与目标数据集对应的概率表示之间的差异距离来推导风险边界,从而对终身学习模型进行理论分析。受理论分析的启发,我们引入了一种新的终身学习方法,即终身无限混合(LIMix)模型,该模型可以自动扩展其网络结构或选择适当的组件来调整其参数以学习新任务,同时保留其先前学习的信息。我们建议通过Dirichlet过程,通过使用门控机制来合并知识,门控机制计算先前学习并存储在每个组件中的知识与新数据集之间的依赖关系。此外,我们还训练了一个紧凑的学生模型,该模型可以随着时间的推移积累跨域表示并进行快速推断。该守则可于https://github.com/dtuzi123/Lifelong-infinite-mixture-model. 摘要:Recent research efforts in lifelong learning propose to grow a mixture of models to adapt to an increasing number of tasks. The proposed methodology shows promising results in overcoming catastrophic forgetting. However, the theory behind these successful models is still not well understood. In this paper, we perform the theoretical analysis for lifelong learning models by deriving the risk bounds based on the discrepancy distance between the probabilistic representation of data generated by the model and that corresponding to the target dataset. Inspired by the theoretical analysis, we introduce a new lifelong learning approach, namely the Lifelong Infinite Mixture (LIMix) model, which can automatically expand its network architectures or choose an appropriate component to adapt its parameters for learning a new task, while preserving its previously learnt information. We propose to incorporate the knowledge by means of Dirichlet processes by using a gating mechanism which computes the dependence between the knowledge learnt previously and stored in each component, and a new set of data. Besides, we train a compact Student model which can accumulate cross-domain representations over time and make quick inferences. The code is available at https://github.com/dtuzi123/Lifelong-infinite-mixture-model.

【3】 Quantum Machine Learning for Health State Diagnosis and Prognostics 标题:用于健康状态诊断和预后的量子机学习 链接:https://arxiv.org/abs/2108.12265

作者:Gabriel San Martín,Enrique López Droguett 机构:Department of Civil and Environmental Engineering, University of California Los Angeles, USA. E-mail:, Department of Civil and Environmental Engineering, and Garrick Institute for the Risk Sciences, University of 备注:Pre-print for RAMS 2022 Conference 摘要:量子计算是一个新的领域,由于其表现力强、灵活性强以及在速度和可扩展性方面的良好结果,近年来吸引了众多领域的研究人员。自2020年以来,全球各地的实验室开始试验机器学习和量子计算并置的模型。量子处理单元(QPU)通过开放API(如IBM的Qiskit)提供给一般科学界,激发了人们对开发和测试解决旧问题的新方法的兴趣。在本文中,我们提出了一个用于健康状态诊断和预测的混合量子机器学习框架。该框架以一个涉及滚珠轴承数据集的问题为例。据我们所知,这是第一次尝试收获和利用量子计算来开发和应用混合量子经典机器学习方法来解决预测和健康管理(PHM)问题。我们希望本文能在风险和可靠性领域开创量子机器学习算法的探索和应用。 摘要:Quantum computing is a new field that has recently attracted researchers from a broad range of fields due to its representation power, flexibility and promising results in both speed and scalability. Since 2020, laboratories around the globe have started to experiment with models that lie in the juxtaposition between machine learning and quantum computing. The availability of quantum processing units (QPUs) to the general scientific community through open APIs (e.g., Qiskit from IBM) have kindled the interest in developing and testing new approaches to old problems. In this paper, we present a hybrid quantum machine learning framework for health state diagnostics and prognostics. The framework is exemplified using a problem involving ball bearings dataset. To the best of our knowledge, this is the first attempt to harvest and leverage quantum computing to develop and apply a hybrid quantum-classical machine learning approach to a prognostics and health management (PHM) problem. We hope that this paper initiates the exploration and application of quantum machine learning algorithms in areas of risk and reliability.

【4】 This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation 标题:这看起来更像是:通过原型关联传播增强自我解释模型 链接:https://arxiv.org/abs/2108.12204

作者:Srishti Gautam,Marina M. -C. Höhne,Stine Hansen,Robert Jenssen,Michael Kampffmeyer 摘要:当前的机器学习模型在解决各种各样的现实问题方面表现出了很高的效率。然而,它们的黑匣子特性对理解和跟踪底层决策策略提出了重大挑战。作为一种补救措施,许多事后解释和自解释方法已经被开发出来来解释模型的行为。此外,这些方法还支持将模型可以学习的工件识别为类相关特征。在这项工作中,我们提供了一个自解释网络ProtoPNet的详细案例研究,该网络存在一系列伪影。因此,我们确定了ProtoPNet的主要缺点,特别是它的粗糙和空间不精确的解释。我们通过引入原型关联传播(PRP)来解决这些限制,PRP是一种生成更精确的模型感知解释的新方法。此外,为了获得干净的数据集,我们建议使用多视图聚类策略,使用PRP解释来分离伪影图像,从而抑制模型中潜在的伪影学习。 摘要:Current machine learning models have shown high efficiency in solving a wide variety of real-world problems. However, their black box character poses a major challenge for the understanding and traceability of the underlying decision-making strategies. As a remedy, many post-hoc explanation and self-explanatory methods have been developed to interpret the models' behavior. These methods, in addition, enable the identification of artifacts that can be learned by the model as class-relevant features. In this work, we provide a detailed case study of the self-explaining network, ProtoPNet, in the presence of a spectrum of artifacts. Accordingly, we identify the main drawbacks of ProtoPNet, especially, its coarse and spatially imprecise explanations. We address these limitations by introducing Prototypical Relevance Propagation (PRP), a novel method for generating more precise model-aware explanations. Furthermore, in order to obtain a clean dataset, we propose to use multi-view clustering strategies for segregating the artifact images using the PRP explanations, thereby suppressing the potential artifact learning in the models.

【5】 Learning primal-dual sparse kernel machines 标题:学习原始-对偶稀疏核机 链接:https://arxiv.org/abs/2108.12199

作者:Riikka Huusari,Sahely Bhadra,Cécile Capponi,Hachem Kadri,Juho Rousu 机构:Helsinki Institute for Information Technology HIIT, Department of Computer, Science, Aalto University, Espoo, Finland, Indian Institute of Technology, Palakkad, Kerala, India, QARMA, LIS, Aix-Marseille University, Marseille, France 摘要:传统上,核方法依赖于representer定理,该定理指出,学习问题的解是映射到再生核Hilbert空间(RKHS)的数据的线性组合。虽然从理论角度来看,该定理很优雅,但由于算法对大型数据集的可伸缩性以及学习函数的可解释性,该定理是禁止的。在本文中,我们不使用传统的代表中心定理,而是建议在原始数据空间中搜索具有预图像分解的RKHS中的解,其中元素不一定对应于训练集中的元素。然后,我们基于梯度的优化方法依赖于对输入空间中可能的稀疏元素进行优化,并使我们能够获得具有原始稀疏性和双重稀疏性的基于核的模型。通过Rademacher界对该方法的泛化能力进行了理论证明。我们的实验表明,与传统的基于内核的模型相比,具有更好的可扩展性和可解释性。 摘要:Traditionally, kernel methods rely on the representer theorem which states that the solution to a learning problem is obtained as a linear combination of the data mapped into the reproducing kernel Hilbert space (RKHS). While elegant from theoretical point of view, the theorem is prohibitive for algorithms' scalability to large datasets, and the interpretability of the learned function. In this paper, instead of using the traditional representer theorem, we propose to search for a solution in RKHS that has a pre-image decomposition in the original data space, where the elements don't necessarily correspond to the elements in the training set. Our gradient-based optimisation method then hinges on optimising over possibly sparse elements in the input space, and enables us to obtain a kernel-based model with both primal and dual sparsity. We give theoretical justification on the proposed method's generalization ability via a Rademacher bound. Our experiments demonstrate a better scalability and interpretability with accuracy on par with the traditional kernel-based models.

【6】 Canoe : A System for Collaborative Learning for Neural Nets 标题:CANOE:一种面向神经网络的协作学习系统 链接:https://arxiv.org/abs/2108.12124

作者:Harshit Daga,Yiwen Chen,Aastha Agrawal,Ada Gavrilovska 机构:Georgia Institute of Technology, Salesforce, Amazon 摘要:对于边缘计算等高度分布式的环境,协作学习方法避免了对全局共享模型的依赖,而是支持为每个位置定制的模型。为个人学习环境创建定制模型可以减少数据传输量,而同龄人之间的协作可以提供可接受的模型性能。然而,协作假设了知识转移机制的可用性,这对于深度学习模型来说并非微不足道,因为在深度学习模型中,知识不容易归因于精确的模型切片。我们提出了Canoe——一个促进神经网络知识转移的框架。Canoe为从辅助节点的神经网络中动态提取重要参数提供了新的系统支持,并将其与基于多模型boosting的方法结合使用,以提高目标节点的预测性能。使用不同的PyTorch和TensorFlow神经网络模型对Canoe进行的评估表明,与单独学习相比,知识转移机制提高了模型对变化的适应能力,高达3.5倍,同时与联合学习相比,数据移动成本降低了几个数量级。 摘要:For highly distributed environments such as edge computing, collaborative learning approaches eschew the dependence on a global, shared model, in favor of models tailored for each location. Creating tailored models for individual learning contexts reduces the amount of data transfer, while collaboration among peers provides acceptable model performance. Collaboration assumes, however, the availability of knowledge transfer mechanisms, which are not trivial for deep learning models where knowledge isn't easily attributed to precise model slices. We present Canoe - a framework that facilitates knowledge transfer for neural networks. Canoe provides new system support for dynamically extracting significant parameters from a helper node's neural network and uses this with a multi-model boosting-based approach to improve the predictive performance of the target node. The evaluation of Canoe with different PyTorch and TensorFlow neural network models demonstrates that the knowledge transfer mechanism improves the model's adaptiveness to changes up to 3.5X compared to learning in isolation, while affording several magnitudes reduction in data movement costs compared to federated learning.

【7】 Subjective Learning for Open-Ended Data 标题:开放式数据的主观学习 链接:https://arxiv.org/abs/2108.12113

作者:Tianren Zhang,Yizhou Jiang,Xin Su,Shangqi Guo,Feng Chen 机构: Department of Automation, Tsinghua University, Beijing Innovation Center for Future Chip, LSBDPA Beijing Key Laboratory, Technology Architecture Department, WeChat, Tencent 摘要:传统的机器学习方法通常假设数据根据任务进行分割,并且每个任务中的数据可以由单个目标函数建模。但是,在没有手动任务定义的开放式环境中,这种假设是无效的。在本文中,我们提出了一种新的监督学习范式,即从开放数据中学习。开放式数据本质上需要多个单值确定性映射函数来捕获其所有输入-输出关系,这与传统的监督数据表现出本质的结构差异。我们用一个称为映射秩的新概念正式阐述了这种结构特性,并表明开放数据对传统的监督学习构成了一个基本的困难,因为如果数据的映射秩大于1,不同的数据样本可能会相互冲突。为了解决这一问题,我们设计了一个开放式监督学习(OSL)框架,其关键创新是一个主观功能,该功能自动在多个候选模型之间分配数据以解决冲突,从而形成一个自然的认知层次。我们从理论和实证两方面证明了OSL的有效性,并表明OSL在没有任务级监督的情况下实现了类人的任务认知。 摘要:Conventional machine learning methods typically assume that data is split according to tasks, and the data in each task can be modeled by a single target function. However, this assumption is invalid in open-ended environments where no manual task definition is available. In this paper, we present a novel supervised learning paradigm of learning from open-ended data. Open-ended data inherently requires multiple single-valued deterministic mapping functions to capture all its input-output relations, exhibiting an essential structural difference from conventional supervised data. We formally expound this structural property with a novel concept termed as mapping rank, and show that open-ended data poses a fundamental difficulty for conventional supervised learning, since different data samples may conflict with each other if the mapping rank of data is larger than one. To address this issue, we devise an Open-ended Supervised Learning (OSL) framework, of which the key innovation is a subjective function that automatically allocates the data among multiple candidate models to resolve the conflict, developing a natural cognition hierarchy. We demonstrate the efficacy of OSL both theoretically and empirically, and show that OSL achieves human-like task cognition without task-level supervision.

【8】 Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement 标题:全注意力双向深度学习结构在单通道语音增强中的应用 链接:https://arxiv.org/abs/2108.12105

作者:Yuzi Yan,Wei-Qiang Zhang,Michael T. Johnson 机构:Beijing National Research Center for Information Science and Technology, Department of Electronic Engineering, Tsinghua University, Beijing , China, Electrical and Computer Engineering, College of Engineering, University of Kentucky, Lexington, KY ,-, USA 备注:4 pages 摘要:语音增强作为语音识别和语音合成等重要技术的基石,是音频信号处理中的一个关键领域。本文提出了一种新的语音增强深度学习结构。该模型将“完全”注意机制引入到双向序列对序列方法中,以利用每个焦帧后的潜在信息。这是以前基于注意的RNN方法的扩展。与OM-LSA、CNN-LSTM、T-GSA和基于单向注意的LSTM基线相比,所提出的基于双向注意的架构在语音质量(PESQ)方面取得了更好的性能。 摘要:As the cornerstone of other important technologies, such as speech recognition and speech synthesis, speech enhancement is a critical area in audio signal processing. In this paper, a new deep learning structure for speech enhancement is demonstrated. The model introduces a "full" attention mechanism to a bidirectional sequence-to-sequence method to make use of latent information after each focal frame. This is an extension of the previous attention-based RNN method. The proposed bidirectional attention-based architecture achieves better performance in terms of speech quality (PESQ), compared with OM-LSA, CNN-LSTM, T-GSA and the unidirectional attention-based LSTM baseline.

【9】 Learning to Give Checkable Answers with Prover-Verifier Games 标题:用验证者-验证者博弈学习给出可检查的答案 链接:https://arxiv.org/abs/2108.12099

作者:Cem Anil,Guodong Zhang,Yuhuai Wu,Roger Grosse 摘要:我们知道何时信任机器学习系统做出的决策的能力没有跟上其性能的惊人提高,限制了其在高风险领域的适用性。我们引入了Prover-Verifier博弈(PVGs),这是一个博弈论框架,鼓励学习代理以可验证的方式解决决策问题。PVG由两个相互竞争的学习者组成:一个可信的验证者网络试图选择正确的答案,另一个更强大但不可信的验证者网络试图说服验证者接受特定的答案,而不管其正确性如何。目标是从这个游戏中产生一个可靠的证明协议。我们分析了框架的各种变体,包括同时博弈和连续博弈,并将空间缩小到可证明具有期望均衡的博弈子集。我们为两个算法任务开发了PVG的实例,并表明在实践中,验证者学习了一个健壮的决策规则,该规则能够从不可信的验证者那里接收有用和可靠的信息。重要的是,即使验证者被冻结,并且验证者的消息被直接优化以说服验证者,该协议仍然有效。 摘要:Our ability to know when to trust the decisions made by machine learning systems has not kept up with the staggering improvements in their performance, limiting their applicability in high-stakes domains. We introduce Prover-Verifier Games (PVGs), a game-theoretic framework to encourage learning agents to solve decision problems in a verifiable manner. The PVG consists of two learners with competing objectives: a trusted verifier network tries to choose the correct answer, and a more powerful but untrusted prover network attempts to persuade the verifier of a particular answer, regardless of its correctness. The goal is for a reliable justification protocol to emerge from this game. We analyze variants of the framework, including simultaneous and sequential games, and narrow the space down to a subset of games which provably have the desired equilibria. We develop instantiations of the PVG for two algorithmic tasks, and show that in practice, the verifier learns a robust decision rule that is able to receive useful and reliable information from an untrusted prover. Importantly, the protocol still works even when the verifier is frozen and the prover's messages are directly optimized to convince the verifier.

【10】 Simulating progressive intramural damage leading to aortic dissection using an operator-regression neural network 标题:用算子回归神经网络模拟导致主动脉夹层的进行性室壁内损伤 链接:https://arxiv.org/abs/2108.11985

作者:Minglang Yin,Ehsan Ban,Bruno V. Rego,Enrui Zhang,Cristina Cavinato,Jay D. Humphrey,George Em Karniadakis 机构:Center for Biomedical Engineering, Brown University, Providence, RI , School of Engineering, Brown University, Providence, RI , Department of Biomedical Engineering, Yale University, New Haven, CT 摘要:主动脉夹层通过壁的中间层分层进行。尽管这一过程很复杂,但通过体外和电子研究,通过液体注射对壁内空间进行准静态加压,从而了解到解剖的进展,这表明,解剖的差异倾向可能受到连接相邻弹性片层的结构上重要的层间支柱的空间分布的影响。特别是,在解剖过程中,不同的组织显微结构可能导致不同的力学行为,包括注入流体的压力-体积关系和相邻薄片之间的位移场。在本研究中,我们使用DeepONet(一种新的算子——回归神经网络)开发了一个数据驱动的替代模型,用于差分支柱分布的分层过程。使用相场有限元模型生成的电子数据,对替代模型进行训练,以预测注入流体的压力-体积曲线和给定压杆空间分布的壁的损伤进展场。结果表明,DeepONet能够准确预测各种支柱分布,表明该复合分支-主干神经网络能够有效提取不同微观结构与其力学性能之间的潜在函数关系。更广泛地说,DeepONet可以促进基于替代模型的分析,以量化生物变异性,改进逆向设计,并基于多模态实验数据预测力学性能。 摘要:Aortic dissection progresses via delamination of the medial layer of the wall. Notwithstanding the complexity of this process, insight has been gleaned by studying in vitro and in silico the progression of dissection driven by quasi-static pressurization of the intramural space by fluid injection, which demonstrates that the differential propensity of dissection can be affected by spatial distributions of structurally significant interlamellar struts that connect adjacent elastic lamellae. In particular, diverse histological microstructures may lead to differential mechanical behavior during dissection, including the pressure--volume relationship of the injected fluid and the displacement field between adjacent lamellae. In this study, we develop a data-driven surrogate model for the delamination process for differential strut distributions using DeepONet, a new operator--regression neural network. The surrogate model is trained to predict the pressure--volume curve of the injected fluid and the damage progression field of the wall given a spatial distribution of struts, with in silico data generated with a phase-field finite element model. The results show that DeepONet can provide accurate predictions for diverse strut distributions, indicating that this composite branch-trunk neural network can effectively extract the underlying functional relationship between distinctive microstructures and their mechanical properties. More broadly, DeepONet can facilitate surrogate model-based analyses to quantify biological variability, improve inverse design, and predict mechanical properties based on multi-modality experimental data.

【11】 A Guide to Reproducible Research in Signal Processing and Machine Learning 标题:信号处理和机器学习中的可重复性研究指南 链接:https://arxiv.org/abs/2108.12383

作者:Joseph Shenouda,Waheed U. Bajwa 备注:19 pages; preprint 摘要:再现性是一个日益增长的问题,已在计算研究人员以及信号处理和机器学习研究界进行了广泛的研究。然而,随着信号处理和机器学习研究领域的不断变化,在创建可复制实验方面出现了新的障碍和未知的挑战。由于这些新的挑战,大多数实验已经变得很难,如果不是不可能的话,由一个独立的研究人员进行复制。2016年,《自然》杂志开展的一项调查发现,50%的研究人员无法复制自己的实验。虽然再现性问题已在文献中讨论过,特别是在信号处理界,但大多数研究人员仍不清楚在不影响他们进行研究的主要责任的情况下,确保再现性的最佳实践是什么。我们认为,尽管研究人员了解使实验可再现性的重要性,但由于缺乏一套明确的标准和工具,因此很难在大多数实验室中纳入良好的再现性实践。正是在这方面,我们的目标是为信号处理研究人员提供一套实用的工具和策略,以帮助缓解产生可重复计算实验的许多障碍。 摘要:Reproducibility is a growing problem that has been extensively studied among computational researchers and within the signal processing and machine learning research community. However, with the changing landscape of signal processing and machine learning research come new obstacles and unseen challenges in creating reproducible experiments. Due to these new challenges most experiments have become difficult, if not impossible, to be reproduced by an independent researcher. In 2016 a survey conducted by the journal Nature found that 50% of researchers were unable to reproduce their own experiments. While the issue of reproducibility has been discussed in the literature and specifically within the signal processing community, it is still unclear to most researchers what are the best practices to ensure reproducibility without impinging on their primary responsibility of conducting research. We feel that although researchers understand the importance of making experiments reproducible, the lack of a clear set of standards and tools makes it difficult to incorporate good reproducibility practices in most labs. It is in this regard that we aim to present signal processing researchers with a set of practical tools and strategies that can help mitigate many of the obstacles to producing reproducible computational experiments.

其他(9篇)

【1】 A Perceptually-Validated Metric for Crowd Trajectory Quality Evaluation 标题:一种用于人群轨迹质量评价的感知验证性度量方法 链接:https://arxiv.org/abs/2108.12346

作者:Beatriz Cabrero Daniel,Ricardo Marques,Ludovic Hoyet,Julien Pettré,Josep Blat 机构: Universitat Pompeu Fabra, Universitat de Barcelona 备注:17 pages, to appear on PACMGIT 摘要:模拟人群需要控制大量的轨迹,通常使用人群运动算法执行,需要找到合适的参数值。通过知觉实验或与真实人群轨迹的比较,研究了模拟技术参数值与生成轨迹质量之间的关系。在本文中,我们整合了这两种策略。提出了一种质量度量QF,用于从参考数据中提取影响轨迹真实感的最显著特征。QF加权并结合基于轨迹的几个单独、局部和全局属性的成本函数。这些轨迹特征选自文献和专家访谈。为了验证QF捕捉感知轨迹质量的能力,我们进行了一个在线实验,证明了自动质量分数与非专家用户之间的高度一致性。为了进一步证明QF的有用性,我们在一个无数据参数调整应用程序中使用它,该应用程序能够调整任何参数化微观人群模拟模型,该模型为角色输出独立的轨迹。调谐人群运动模型的学习参数保持参考数据的影响,参考数据用于加权QF项。 摘要:Simulating crowds requires controlling a very large number of trajectories and is usually performed using crowd motion algorithms for which appropriate parameter values need to be found. The study of the relation between parametric values for simulation techniques and the quality of the resulting trajectories has been studied either through perceptual experiments or by comparison with real crowd trajectories. In this paper, we integrate both strategies. A quality metric, QF, is proposed to abstract from reference data while capturing the most salient features that affect the perception of trajectory realism. QF weights and combines cost functions that are based on several individual, local and global properties of trajectories. These trajectory features are selected from the literature and from interviews with experts. To validate the capacity of QF to capture perceived trajectory quality, we conduct an online experiment that demonstrates the high agreement between the automatic quality score and non-expert users. To further demonstrate the usefulness of QF, we use it in a data-free parameter tuning application able to tune any parametric microscopic crowd simulation model that outputs independent trajectories for characters. The learnt parameters for the tuned crowd motion model maintain the influence of the reference data which was used to weight the terms of QF.

【2】 LassoLayer: Nonlinear Feature Selection by Switching One-to-one Links 标题:LassoLayer:一对一链路切换的非线性特征选择 链接:https://arxiv.org/abs/2108.12165

作者:Akihito Sudo,Teng Teck Hou,Masaki Yamaguchi,Yoshinori Tone 机构:Shizuoka University, Japan, ST Engineering Ltd., Singapore, yamaguchi.masaki., JAVIS CO., LTD., Vietnam. 摘要:随着人们对解决更复杂问题的渴望,特征选择方法变得越来越重要。特征选择方法可分为包装方法、过滤方法和嵌入方法。Lasso作为一种强大的嵌入式特征选择方法,引起了众多研究者的关注。然而,作为一种线性方法,套索的适用性受到限制。在这项工作中,我们提出了LassoLayer,它是一对一连接的,并通过L1优化进行训练,从而去掉不必要的预测单元。对于非线性特征选择,我们构建了LassoMLP:配备LassoLayer作为第一层的网络。因为我们可以在任何网络结构中插入LassoLayer,所以它可以利用神经网络的强度,适用于需要特征选择的任务。我们通过回归和分类任务评估LassoMLP在特征选择中的作用。LassoMLP接收的功能包括大量噪音因素,这些因素对过度装配有害。在使用MNIST数据集的实验中,我们确认LassoMLP优于最先进的方法。 摘要:Along with the desire to address more complex problems, feature selection methods have gained in importance. Feature selection methods can be classified into wrapper method, filter method, and embedded method. Being a powerful embedded feature selection method, Lasso has attracted the attention of many researchers. However, as a linear approach, the applicability of Lasso has been limited. In this work, we propose LassoLayer that is one-to-one connected and trained by L1 optimization, which work to drop out unnecessary units for prediction. For nonlinear feature selections, we build LassoMLP: the network equipped with LassoLayer as its first layer. Because we can insert LassoLayer in any network structure, it can harness the strength of neural network suitable for tasks where feature selection is needed. We evaluate LassoMLP in feature selection with regression and classification tasks. LassoMLP receives features including considerable numbers of noisy factors that is harmful for overfitting. In the experiments using MNIST dataset, we confirm that LassoMLP outperforms the state-of-the-art method.

【3】 An Introduction to Hamiltonian Monte Carlo Method for Sampling 标题:哈密顿蒙特卡罗抽样方法简介 链接:https://arxiv.org/abs/2108.12107

作者:Nisheeth K. Vishnoi 备注:This exposition is to supplement the talk by the author at the Bootcamp in the semester on Geometric Methods for Optimization and Sampling at the Simons Institute for the Theory of Computing 摘要:本文的目的是介绍哈密顿蒙特卡罗(HMC)方法——一种从吉布斯密度$\pi(x)\propto e^{-f(x)}$采样的哈密顿动力学启发算法。我们关注“理想化”的情况,在这种情况下,我们可以精确地计算连续的轨迹。我们证明了理想HMC保持$\pi$,并且当$\f$是强凸光滑的时,我们证明了它的收敛性。 摘要:The goal of this article is to introduce the Hamiltonian Monte Carlo (HMC) method -- a Hamiltonian dynamics-inspired algorithm for sampling from a Gibbs density $\pi(x) \propto e^{-f(x)}$. We focus on the "idealized" case, where one can compute continuous trajectories exactly. We show that idealized HMC preserves $\pi$ and we establish its convergence when $f$ is strongly convex and smooth.

【4】 A framework for massive scale personalized promotion 标题:一种大规模个性化推广的框架 链接:https://arxiv.org/abs/2108.12100

作者:Yitao Shen,Yue Wang,Xingyu Lu,Feng Qi,Jia Yan,Yixiang Mu,Yao Yang,YiFan Peng,Jinjie Gu 机构:Ant Financial Services Group, Hangzhou, Zhejiang, China, San Mateo, California, United States, Shanghai, China, School of Computer Science and, engineering, University of Electronic Science and, Technology of China, ChengDu, SiChuan, China 摘要:构建面向消费者平台的技术公司可能会接触到大规模的用户群体。近年来,具有量化激励的促销已成为在此类平台上增加活跃用户的流行方法。一方面,增加用户活动可以引入网络效应,吸引广告受众,并产生其他效益。另一方面,大规模的推广会造成巨大的成本。因此,使促销活动在投资回报率(ROI)方面具有效率是许多公司非常感兴趣的。本文提出了一个实用的两阶段框架,可以优化各种大规模促销活动的投资回报率。在第一阶段,利用机器学习技术对用户的个人促销响应曲线进行建模。在第二阶段,业务目标和资源约束被描述为一个优化问题,其决策变量是给予每个用户多少激励。为了在第二阶段进行有效的优化,反事实预测和降噪对于第一阶段至关重要。我们利用现有的反事实预测技术来纠正数据中的治疗偏差。我们还引入了一种新的深度神经网络(DNN)结构,深度等渗提升网络(DIPN),以减少提升响应曲线中的噪声。DIPN体系结构通过增强等渗性和平滑度,结合了我们对响应曲线形状的先验知识。在我们的实验中,它的性能超过了常规DNN和其他最先进的形状约束模型。 摘要:Technology companies building consumer-facing platforms may have access to massive-scale user population. In recent years, promotion with quantifiable incentive has become a popular approach for increasing active users on such platforms. On one hand, increased user activities can introduce network effect, bring in advertisement audience, and produce other benefits. On the other hand, massive-scale promotion causes massive cost. Therefore making promotion campaigns efficient in terms of return-on-investment (ROI) is of great interest to many companies. This paper proposes a practical two-stage framework that can optimize the ROI of various massive-scale promotion campaigns. In the first stage, users' personal promotion-response curves are modeled by machine learning techniques. In the second stage, business objectives and resource constraints are formulated into an optimization problem, the decision variables of which are how much incentive to give to each user. In order to do effective optimization in the second stage, counterfactual prediction and noise-reduction are essential for the first stage. We leverage existing counterfactual prediction techniques to correct treatment bias in data. We also introduce a novel deep neural network (DNN) architecture, the deep-isotonic-promotion-network (DIPN), to reduce noise in the promotion response curves. The DIPN architecture incorporates our prior knowledge of response curve shape, by enforcing isotonicity and smoothness. It out-performed regular DNN and other state-of-the-art shape-constrained models in our experiments.

【5】 An Automatic Image Content Retrieval Method for better Mobile Device Display User Experiences 标题:一种改善移动设备显示用户体验的自动图像内容检索方法 链接:https://arxiv.org/abs/2108.12068

作者:Alessandro Bruno 机构:Department of Computing and Informatics at Bournemouth University, Fern Barrow, Poole, Dorset, BH,BB, United Kingdom 备注:5 pages, 5 figures 摘要:越来越多的商用手机配备了集成的高分辨率数码相机。这为图像分析提供了一类新的专用应用,如移动视觉搜索、图像裁剪、目标检测、基于内容的图像检索、图像分类。本文提出了一种用于移动设备显示的图像内容检索和分类的移动应用程序,以丰富用户的视觉体验。移动应用程序可以基于图像的内容,通过视觉显著性方法提取一定数量的图像,该视觉显著性方法旨在从感知视点检测给定图像中的最关键区域。首先,使用2D显著性函数的局部极大值从感知角度提取最关键的区域。接下来,使用以图像的阈值显著性贴图的局部最大值为中心的边界框裁剪显著区域。然后,将每幅图像裁剪成一个基于SVM和SIFT描述符的图像分类系统,以检测图像中存在的对象类别。使用ImageNet存储库作为语义类别分类的参考。Android平台用于在客户端-服务器架构上实现移动应用程序。移动客户端将相机拍摄的照片发送到服务器,服务器处理图像并将结果(图像内容,如图像裁剪和相关目标类)返回给移动客户端。该应用程序在数千张图片上运行,并显示了令人鼓舞的结果,通过移动显示器实现了更好的用户视觉体验。 摘要:A growing number of commercially available mobile phones come with integrated high-resolution digital cameras. That enables a new class of dedicated applications to image analysis such as mobile visual search, image cropping, object detection, content-based image retrieval, image classification. In this paper, a new mobile application for image content retrieval and classification for mobile device display is proposed to enrich the visual experience of users. The mobile application can extract a certain number of images based on the content of an image with visual saliency methods aiming at detecting the most critical regions in a given image from a perceptual viewpoint. First, the most critical areas from a perceptual perspective are extracted using the local maxima of a 2D saliency function. Next, a salient region is cropped using the bounding box centred on the local maxima of the thresholded Saliency Map of the image. Then, each image crop feds into an Image Classification system based on SVM and SIFT descriptors to detect the class of object present in the image. ImageNet repository was used as the reference for semantic category classification. Android platform was used to implement the mobile application on a client-server architecture. A mobile client sends the photo taken by the camera to the server, which processes the image and returns the results (image contents such as image crops and related target classes) to the mobile client. The application was run on thousands of pictures and showed encouraging results towards a better user visual experience with mobile displays.

【6】 When and how epochwise double descent happens 标题:划时代的双重下降发生的时间和方式 链接:https://arxiv.org/abs/2108.12006

作者:Cory Stephenson,Tyler Lee 机构:Intel Labs 备注:15 Pages (main 11 pages, supplemental 4 pages), 5 figures 摘要:众所周知,随着参数数量的增加,深度神经网络表现出“双下降”行为。最近,也有研究表明,存在“分时代双下降”效应,即泛化误差最初下降,然后上升,最后随着训练时间的增加再次下降。这是一个实际问题,因为训练所需的时间很长,并且基于验证性能提前停止可能会导致次优泛化。在这项工作中,我们开发了一个可分析的顺时代双下降模型,该模型允许我们在理论上描述这种效应何时可能发生。该模型基于这样一个假设,即训练数据包含学习速度慢但信息量大的特征。然后,我们通过实验证明,深度神经网络的行为与我们的理论模型相似。我们的研究结果表明,按时代双下降需要出现临界噪声量,但高于第二个临界噪声级,早期停止仍然有效。利用理论上的见解,我们给出了两种方法来消除划时代双下降:一种是从输入中去除学习速度慢的特征并降低泛化性能,另一种是修改训练动态并匹配或超过标准训练的泛化性能。综上所述,我们的结果显示了一幅新的画面,展示了如何从训练数据中的训练动态和噪声之间的相互作用中产生划时代的双重下降。 摘要:Deep neural networks are known to exhibit a `double descent' behavior as the number of parameters increases. Recently, it has also been shown that an `epochwise double descent' effect exists in which the generalization error initially drops, then rises, and finally drops again with increasing training time. This presents a practical problem in that the amount of time required for training is long, and early stopping based on validation performance may result in suboptimal generalization. In this work we develop an analytically tractable model of epochwise double descent that allows us to characterise theoretically when this effect is likely to occur. This model is based on the hypothesis that the training data contains features that are slow to learn but informative. We then show experimentally that deep neural networks behave similarly to our theoretical model. Our findings indicate that epochwise double descent requires a critical amount of noise to occur, but above a second critical noise level early stopping remains effective. Using insights from theory, we give two methods by which epochwise double descent can be removed: one that removes slow to learn features from the input and reduces generalization performance, and another that instead modifies the training dynamics and matches or exceeds the generalization performance of standard training. Taken together, our results suggest a new picture of how epochwise double descent emerges from the interplay between the dynamics of training and noise in the training data.

【7】 Resource allocation method using tug-of-war-based synchronization 标题:一种基于拔河同步的资源分配方法 链接:https://arxiv.org/abs/2108.11979

作者:Song-Ju Kim,Hiroyuki Yasuda,Ryoma Kitagawa,Mikio Hasegawa 机构:† SOBIN Institute LLC, Kawanishi, Japan 备注:5 pages, 2 figures 摘要:我们提出了一种简单的基于拔河(TOW)动力学的信道分配方法,结合基于非线性振荡器同步的时间调度,有效地利用了无线通信中的空间(信道)和时间资源。这项研究表明,同步组(其中每个节点选择不同的信道)在相空间中分布不均匀,因此组之间的每个距离都大于影响区域。根据信道报酬,可以形成新的自组织时空模式进行资源分配。 摘要:We propose a simple channel-allocation method based on tug-of-war (TOW) dynamics, combined with the time scheduling based on nonlinear oscillator synchronization to efficiently use of the space (channel) and time resources in wireless communications. This study demonstrates that synchronization groups, where each node selects a different channel, are non-uniformly distributed in phase space such that every distance between groups is larger than the area of influence. New type of self-organized spatiotemporal patterns can be formed for resource allocation according to channel rewards.

【8】 JUWELS Booster -- A Supercomputer for Large-Scale AI Research 标题:JUWELS BOOSTER--一台用于大规模人工智能研究的超级计算机 链接:https://arxiv.org/abs/2108.11976

作者:Stefan Kesselheim,Andreas Herten,Kai Krajsek,Jan Ebert,Jenia Jitsev,Mehdi Cherti,Michael Langguth,Bing Gong,Scarlet Stadtler,Amirpasha Mozaffari,Gabriele Cavallaro,Rocco Sedona,Alexander Schug,Alexandre Strube,Roshni Kamath,Martin G. Schultz,Morris Riedel,Thomas Lippert 机构: J¨ulich Supercomputing Centre, Forschungszentrum J¨ulich GmbH, Germany, School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland, University of Duisburg-Essen, Germany 备注:12 pages, 5 figures. Accepted at ISC 2021, Workshop Deep Learning on Supercomputers. This is a duplicate submission as my previous submission is on hold for several weeks now and my attempts to contact the moderators failed 摘要:在这篇文章中,我们介绍JUWELS Booster,一个最近在Julich超级计算中心投入使用的高性能计算系统。凭借其系统架构,最重要的是其大量强大的图形处理单元(GPU)以及通过InfiniBand的快速互连,它是大规模人工智能(AI)研究和应用的理想机器。我们详细介绍了它的系统架构、并行、分布式模型训练,以及表明其卓越性能的基准测试。我们通过展示需要这种设备的各个科学领域的大规模人工智能研究重点,展示了其研究应用潜力。 摘要:In this article, we present JUWELS Booster, a recently commissioned high-performance computing system at the J\"ulich Supercomputing Center. With its system architecture, most importantly its large number of powerful Graphics Processing Units (GPUs) and its fast interconnect via InfiniBand, it is an ideal machine for large-scale Artificial Intelligence (AI) research and applications. We detail its system architecture, parallel, distributed model training, and benchmarks indicating its outstanding performance. We exemplify its potential for research application by presenting large-scale AI research highlights from various scientific fields that require such a facility.

【9】 Multiple Hypothesis Testing Framework for Spatial Signals 标题:空间信号的多假设检验框架 链接:https://arxiv.org/abs/2108.12314

作者:Martin Gölz,Abdelhak M. Zoubir,Visa Koivunen 机构: Koivunen is with the Department of SignalProcessing and Acoustics, Aalto University 备注:Submitted to IEEE Transactions on Signal and Information Processing over Networks 摘要:识别空间感兴趣、不同或敌对行为区域的问题是涉及分布式多传感器系统的许多实际应用所固有的。在这项工作中,我们开发了一个基于多假设检验的通用框架来识别这些区域。假设监测环境为离散空间网格。识别与不同假设相关联的空间网格点,同时将错误发现率控制在预先指定的水平。使用大规模传感器网络获取测量值。我们提出了一种基于谱矩方法的数据驱动的局部错误发现率估计方法。我们的方法对潜在物理现象的特定空间传播模型是不可知的。它依靠广泛适用的密度模型进行局部汇总统计。在传感器之间,根据插值的局部错误发现率将位置分配给与不同假设相关的区域。空间传播无线电波的应用说明了我们方法的优点。 摘要:The problem of identifying regions of spatially interesting, different or adversarial behavior is inherent to many practical applications involving distributed multisensor systems. In this work, we develop a general framework stemming from multiple hypothesis testing to identify such regions. A discrete spatial grid is assumed for the monitored environment. The spatial grid points associated with different hypotheses are identified while controlling the false discovery rate at a pre-specified level. Measurements are acquired using a large-scale sensor network. We propose a novel, data-driven method to estimate local false discovery rates based on the spectral method of moments. Our method is agnostic to specific spatial propagation models of the underlying physical phenomenon. It relies on a broadly applicable density model for local summary statistics. In between sensors, locations are assigned to regions associated with different hypotheses based on interpolated local false discovery rates. The benefits of our method are illustrated by applications to spatially propagating radio waves.

机器翻译,仅供参考

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-08-30,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
NLP 服务
NLP 服务(Natural Language Process,NLP)深度整合了腾讯内部的 NLP 技术,提供多项智能文本处理和文本生成能力,包括词法分析、相似词召回、词相似度、句子相似度、文本润色、句子纠错、文本补全、句子生成等。满足各行业的文本智能需求。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档