前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习学术速递[7.13]

机器学习学术速递[7.13]

作者头像
公众号-arXiv每日学术速递
发布2021-07-27 10:51:03
1.8K0
发布2021-07-27 10:51:03
举报
文章被收录于专栏:arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.LG 方向,今日共计134篇

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】 Towards Better Laplacian Representation in Reinforcement Learning with Generalized Graph Drawing 标题:广义图画强化学习中更好的拉普拉斯表示

作者:Kaixin Wang,Kuangqi Zhou,Qixin Zhang,Jie Shao,Bryan Hooi,Jiashi Feng 机构: ex-Equal contribution 1National University of Singapore 2CityUniversity of Hong Kong 3ByteDance AI lab 备注:ICML 2021 链接:https://arxiv.org/abs/2107.05545 摘要:拉普拉斯表示法以状态转移图的拉普拉斯矩阵的特征向量作为状态嵌入,为状态提供了简洁、信息丰富的表示,近年来在强化学习中受到越来越多的关注。这样的表示捕获了底层状态空间的几何结构,并且有利于RL任务,例如选项发现和奖励成形。为了在大的(甚至是连续的)状态空间中逼近拉普拉斯表示,最近的工作提出最小化一个谱图绘制目标,然而它除了特征向量之外还有无穷多个全局极小值。因此,他们所学的拉普拉斯表示可能不同于基本真理。为了解决这个问题,我们将图的绘制目标转化为一个广义形式,并导出一个新的学习目标,证明了它的特征向量是唯一的全局极小值。它使学习高质量的拉普拉斯表示,忠实地接近地面真理。我们通过在一组gridworld和连续控制环境上的综合实验来验证这一点。此外,我们发现,我们学习拉普拉斯表示导致更多的探索性选择和更好的奖励塑造。 摘要:The Laplacian representation recently gains increasing attention for reinforcement learning as it provides succinct and informative representation for states, by taking the eigenvectors of the Laplacian matrix of the state-transition graph as state embeddings. Such representation captures the geometry of the underlying state space and is beneficial to RL tasks such as option discovery and reward shaping. To approximate the Laplacian representation in large (or even continuous) state spaces, recent works propose to minimize a spectral graph drawing objective, which however has infinitely many global minimizers other than the eigenvectors. As a result, their learned Laplacian representation may differ from the ground truth. To solve this problem, we reformulate the graph drawing objective into a generalized form and derive a new learning objective, which is proved to have eigenvectors as its unique global minimizer. It enables learning high-quality Laplacian representations that faithfully approximate the ground truth. We validate this via comprehensive experiments on a set of gridworld and continuous control environments. Moreover, we show that our learned Laplacian representations lead to more exploratory options and better reward shaping.

【2】 Position-enhanced and Time-aware Graph Convolutional Network for Sequential Recommendations 标题:用于顺序推荐的位置增强型和时间感知型图卷积网络

作者:Liwei Huang,Yutao Ma,Yanbo Liu,Shuliang Wang,Deyi Li 机构:. School of Computer Science and Technology, Beijing Institute of Technology, Beijing , . Beijing Institute of Remote Sensing, Beijing , China, . School of Computer Science, Wuhan University, Wuhan , China 备注:25 pages, 5 figures, 6 tables 链接:https://arxiv.org/abs/2107.05235 摘要:现有的基于深度学习的序贯推荐方法大多利用递归神经网络结构或自关注来建模用户历史行为之间的时序模式和时间影响,并在特定时间学习用户的偏好。然而,这些方法有两个主要缺点。首先,他们专注于从以用户为中心的角度来建模用户的动态状态,而总是忽略项目随时间的变化。第二,大多数用户只处理一阶用户项交互,不考虑用户和项之间的高阶连通性,最近被证明有助于顺序推荐。为了解决上述问题,本文尝试用二部图结构来建模用户项交互,提出了一种基于位置增强和时间感知的图卷积网络(PTGCN)的顺序推荐方法。PTGCN通过定义一个位置增强的时间感知图卷积运算,并通过一个自关注聚合器同时学习用户和项目在二部图上的动态表示,对用户-项目交互之间的时序模式和时间动态进行建模。通过多层图卷积的叠加,实现了用户与项目之间的高阶连通。为了证明PTGCN的有效性,我们在三个不同大小的真实数据集上对PTGCN进行了综合评估,并与一些有竞争力的基线进行了比较。实验结果表明,PTGCN在两种常用的排序评价指标上优于几种最新的模型。 摘要:Most of the existing deep learning-based sequential recommendation approaches utilize the recurrent neural network architecture or self-attention to model the sequential patterns and temporal influence among a user's historical behavior and learn the user's preference at a specific time. However, these methods have two main drawbacks. First, they focus on modeling users' dynamic states from a user-centric perspective and always neglect the dynamics of items over time. Second, most of them deal with only the first-order user-item interactions and do not consider the high-order connectivity between users and items, which has recently been proved helpful for the sequential recommendation. To address the above problems, in this article, we attempt to model user-item interactions by a bipartite graph structure and propose a new recommendation approach based on a Position-enhanced and Time-aware Graph Convolutional Network (PTGCN) for the sequential recommendation. PTGCN models the sequential patterns and temporal dynamics between user-item interactions by defining a position-enhanced and time-aware graph convolution operation and learning the dynamic representations of users and items simultaneously on the bipartite graph with a self-attention aggregator. Also, it realizes the high-order connectivity between users and items by stacking multi-layer graph convolutions. To demonstrate the effectiveness of PTGCN, we carried out a comprehensive evaluation of PTGCN on three real-world datasets of different sizes compared with a few competitive baselines. Experimental results indicate that PTGCN outperforms several state-of-the-art models in terms of two commonly-used evaluation metrics for ranking.

【3】 MugRep: A Multi-Task Hierarchical Graph Representation Learning Framework for Real Estate Appraisal 标题:MugRep:一种房地产评估的多任务层次图表示学习框架

作者:Weijia Zhang,Hao Liu,Lijun Zha,Hengshu Zhu,Ji Liu,Dejing Dou,Hui Xiong 机构:School of Computer Science, University of Science and Technology of China, Baidu Research,Baidu Talent Intelligence Center, Baidu Inc.,Rutgers University 备注:11 pages, SIGKDD-2021 链接:https://arxiv.org/abs/2107.05180 摘要:房地产估价是指对房地产的市场价值形成无偏见的意见的过程,对市场中的各种参与者(如房地产经纪人、估价人员、贷款人和买方)的决策起着至关重要的作用。然而,准确的房地产估价是一项非常艰巨的任务,面临着三大挑战:(1)房地产价值影响因素复杂(2) 房地产交易的非同步时空依赖性(3) 居住社区之间的多元化关联。为此,我们提出了一个多任务层次图表示学习(MugRep)框架,用于房地产估价。具体来说,通过获取和整合多源城市数据,我们首先构建一个丰富的特征集,从多个角度(如地理分布、人员流动分布和居民人口分布)全面分析房地产。在此基础上,提出了一个演化的房地产交易图和相应的事件图卷积模块,以实现房地产交易之间的非同步时空依赖。此外,为了进一步从居住社区的角度整合有价值的知识,我们设计了一个分层的异质社区图卷积模块来捕捉居住社区之间的各种相关性。最后,本文引入了一个城区分割的多任务学习模块来生成不同分布的房地产价值观。在两个真实数据集上的大量实验证明了MugRep及其组件和特征的有效性。 摘要:Real estate appraisal refers to the process of developing an unbiased opinion for real property's market value, which plays a vital role in decision-making for various players in the marketplace (e.g., real estate agents, appraisers, lenders, and buyers). However, it is a nontrivial task for accurate real estate appraisal because of three major challenges: (1) The complicated influencing factors for property value; (2) The asynchronously spatiotemporal dependencies among real estate transactions; (3) The diversified correlations between residential communities. To this end, we propose a Multi-Task Hierarchical Graph Representation Learning (MugRep) framework for accurate real estate appraisal. Specifically, by acquiring and integrating multi-source urban data, we first construct a rich feature set to comprehensively profile the real estate from multiple perspectives (e.g., geographical distribution, human mobility distribution, and resident demographics distribution). Then, an evolving real estate transaction graph and a corresponding event graph convolution module are proposed to incorporate asynchronously spatiotemporal dependencies among real estate transactions. Moreover, to further incorporate valuable knowledge from the view of residential communities, we devise a hierarchical heterogeneous community graph convolution module to capture diversified correlations between residential communities. Finally, an urban district partitioned multi-task learning module is introduced to generate differently distributed value opinions for real estate. Extensive experiments on two real-world datasets demonstrate the effectiveness of MugRep and its components and features.

【4】 BrainNNExplainer: An Interpretable Graph Neural Network Framework for Brain Network based Disease Analysis 标题:BrainNNExplainer:一种用于脑网络疾病分析的可解释图形神经网络框架

作者:Hejie Cui,Wei Dai,Yanqiao Zhu,Xiaoxiao Li,Lifang He,Carl Yang 机构: tensor factorizations (He 1Department of Computer Science, Emory University 2Center forResearch on Intelligent Perception and Computing, Institute of Au-tomation, Chinese Academy of Sciences 3School of Artificial Intel-ligence 备注:This paper has been accepted to ICML 2021 Workshop on Interpretable Machine Learning in Healthcare 链接:https://arxiv.org/abs/2107.05097 摘要:用于疾病预测的可解释脑网络模型对神经科学的发展具有重要价值。GNNs很有希望对复杂的网络数据进行建模,但它们容易过度拟合,并且具有较差的可解释性,这妨碍了它们在医疗保健等关键决策场景中的应用。为了弥补这一差距,我们提出了BrainNNExplainer,一个用于脑网络分析的可解释的GNN框架。它主要由两个共同学习的模块组成:一个是专门为大脑网络设计的主干预测模型,一个是突出特定疾病突出的大脑网络连接的解释生成器。在两个具有挑战性的疾病预测数据集上的大量可视化实验结果证明了BrainNNExplainer独特的解释能力和卓越的性能。 摘要:Interpretable brain network models for disease prediction are of great value for the advancement of neuroscience. GNNs are promising to model complicated network data, but they are prone to overfitting and suffer from poor interpretability, which prevents their usage in decision-critical scenarios like healthcare. To bridge this gap, we propose BrainNNExplainer, an interpretable GNN framework for brain network analysis. It is mainly composed of two jointly learned modules: a backbone prediction model that is specifically designed for brain networks and an explanation generator that highlights disease-specific prominent brain network connections. Extensive experimental results with visualizations on two challenging disease prediction datasets demonstrate the unique interpretability and outstanding performance of BrainNNExplainer.

【5】 Zero-Shot Scene Graph Relation Prediction through Commonsense Knowledge Integration 标题:基于常识集成的Zero-Shot场景图关系预测

作者:Xuan Kan,Hejie Cui,Carl Yang 机构:Department of Computer Science, Emory University 备注:This paper has been accepted for presentation in the Research Track of ECML-PKDD 2021 链接:https://arxiv.org/abs/2107.05080 摘要:图像中实体间的关系预测是场景图生成的一个重要步骤,它进一步影响到各种视觉理解和推理任务。然而,现有的SGG框架需要大量的训练,却无法对看不见的(即零炮)三胞胎进行建模。在这项工作中,我们强调,这种能力是由于缺乏常识推理,即在对世界的一般理解的基础上,将相似的实体联系起来并推断出相似的关系的能力。为了填补这一空白,我们提出了一个常识集成场景关系预测(COACHER)框架,该框架集成了SGG的常识知识,特别是零炮关系预测。具体地说,我们开发了新的图挖掘管道,在外部常识知识图中对实体周围的邻域和路径进行建模,并将它们集成到最先进的SGG框架之上。大量的定量评估和定性案例研究都表明了我们提出的方法的有效性。 摘要:Relation prediction among entities in images is an important step in scene graph generation (SGG), which further impacts various visual understanding and reasoning tasks. Existing SGG frameworks, however, require heavy training yet are incapable of modeling unseen (i.e.,zero-shot) triplets. In this work, we stress that such incapability is due to the lack of commonsense reasoning,i.e., the ability to associate similar entities and infer similar relations based on general understanding of the world. To fill this gap, we propose CommOnsense-integrAted sCenegrapHrElation pRediction (COACHER), a framework to integrate commonsense knowledge for SGG, especially for zero-shot relation prediction. Specifically, we develop novel graph mining pipelines to model the neighborhoods and paths around entities in an external commonsense knowledge graph, and integrate them on top of state-of-the-art SGG frameworks. Extensive quantitative evaluations and qualitative case studies on both original and manipulated datasets from Visual Genome demonstrate the effectiveness of our proposed approach.

【6】 STR-GODEs: Spatial-Temporal-Ridership Graph ODEs for Metro Ridership Prediction 标题:STR-GODES:用于地铁客流预测的时空客流图模型

作者:Chuyu Huang 机构:Peking University 链接:https://arxiv.org/abs/2107.04980 摘要:地铁客流预测一直受到各国政府和研究者的广泛关注。最近的工作集中于设计复杂的图卷积递归网络结构来捕获空间和时间模式。这些工作很好地提取了空间维度的信息,但时间维度的局限性仍然存在。将神经ODE算法扩展到图网络,提出了STR-GODEs网络,该网络可以有效地学习空间、时间和乘客的相关信息,而不受在时间轴上将数据划分为等长区间的限制。在学习空间关系和时间相关性的同时,我们对GODE-RNN单元进行了修改,以获得乘客特征和隐藏状态。在GODESolve中加入乘客信息及其隐藏状态,以减少长时间序列在预测中的误差积累。在两个大规模数据集上的大量实验证明了该模型的有效性和鲁棒性。 摘要:The metro ridership prediction has always received extensive attention from governments and researchers. Recent works focus on designing complicated graph convolutional recurrent network architectures to capture spatial and temporal patterns. These works extract the information of spatial dimension well, but the limitation of temporal dimension still exists. We extended Neural ODE algorithms to the graph network and proposed the STR-GODEs network, which can effectively learn spatial, temporal, and ridership correlations without the limitation of dividing data into equal-sized intervals on the timeline. While learning the spatial relations and the temporal correlations, we modify the GODE-RNN cell to obtain the ridership feature and hidden states. Ridership information and its hidden states are added to the GODESolve to reduce the error accumulation caused by long time series in prediction. Extensive experiments on two large-scale datasets demonstrate the efficacy and robustness of our model.

【7】 Beyond Low-pass Filtering: Graph Convolutional Networks with Automatic Filtering 标题:超越低通滤波:具有自动滤波功能的图卷积网络

作者:Zonghan Wu,Shirui Pan,Guodong Long,Jing Jiang,Chengqi Zhang 机构: Pan is with the Department of Data Science and AI 备注:11 pages 链接:https://arxiv.org/abs/2107.04755 摘要:图卷积网络对于图结构数据的深度学习是必不可少的。现有的大多数图卷积网络都有两大缺点。首先,它们本质上是低通滤波器,因此忽略了图形信号中可能有用的中高频带。其次,现有图卷积滤波器的带宽是固定的。图卷积滤波器的参数只变换图的输入,而不改变图卷积滤波器函数的曲率。在现实中,除非我们有专业领域的知识,否则我们不确定是否应该在某一点上保持或切断频率。本文提出了自动图卷积网络(AutoGCN)来捕获图信号的全频谱,并自动更新图卷积滤波器的带宽。虽然它是基于图谱理论,我们的AutoGCN也局限于空间,并具有空间形式。实验结果表明,与仅作为低通滤波器的基线方法相比,AutoGCN算法有显著的改进。 摘要:Graph convolutional networks are becoming indispensable for deep learning from graph-structured data. Most of the existing graph convolutional networks share two big shortcomings. First, they are essentially low-pass filters, thus the potentially useful middle and high frequency band of graph signals are ignored. Second, the bandwidth of existing graph convolutional filters is fixed. Parameters of a graph convolutional filter only transform the graph inputs without changing the curvature of a graph convolutional filter function. In reality, we are uncertain about whether we should retain or cut off the frequency at a certain point unless we have expert domain knowledge. In this paper, we propose Automatic Graph Convolutional Networks (AutoGCN) to capture the full spectrum of graph signals and automatically update the bandwidth of graph convolutional filters. While it is based on graph spectral theory, our AutoGCN is also localized in space and has a spatial form. Experimental results show that AutoGCN achieves significant improvement over baseline methods which only work as low-pass filters.

【8】 Automated Graph Learning via Population Based Self-Tuning GCN 标题:基于种群自校正GCN的自动图学习

作者:Ronghang Zhu,Zhiqiang Tao,Yaliang Li,Sheng Li 机构:∗Department of Computer Science, University of Georgia, Athens, GA, †Department of Computer Science and Engineering, Santa Clara University, Santa Clara, CA, ‡Alibaba Group, Bellevue, WA 备注:This manuscript has been accepted by the SIGIR2021 链接:https://arxiv.org/abs/2107.04713 摘要:图卷积网络(GCN)及其变种由于具有提取有效图嵌入的能力,已成功地应用于节点分类、链路预测和图分类等领域。传统的GCN模型存在着过度拟合和过度平滑的问题,而DropEdge等新技术可以缓解这些问题,从而促进深层GCN的发展。然而,训练GCN模型是非常重要的,因为它对超参数的选择非常敏感,例如辍学率和学习权重衰减,特别是对于深度GCN模型。本文旨在通过超参数优化实现GCN模型训练的自动化。具体地说,我们提出了一种具有交替训练算法的自校正GCN方法,并通过引入基于种群的训练方案进一步扩展了我们的方法。在三个基准数据集上的实验结果表明,与几种有代表性的基线相比,我们的方法在优化多层GCN方面是有效的。 摘要:Owing to the remarkable capability of extracting effective graph embeddings, graph convolutional network (GCN) and its variants have been successfully applied to a broad range of tasks, such as node classification, link prediction, and graph classification. Traditional GCN models suffer from the issues of overfitting and oversmoothing, while some recent techniques like DropEdge could alleviate these issues and thus enable the development of deep GCN. However, training GCN models is non-trivial, as it is sensitive to the choice of hyperparameters such as dropout rate and learning weight decay, especially for deep GCN models. In this paper, we aim to automate the training of GCN models through hyperparameter optimization. To be specific, we propose a self-tuning GCN approach with an alternate training algorithm, and further extend our approach by incorporating the population based training scheme. Experimental results on three benchmark datasets demonstrate the effectiveness of our approaches on optimizing multi-layer GCN, compared with several representative baselines.

Transformer(4篇)

【1】 Calliope -- A Polyphonic Music Transformer 标题:Calliope--复调音乐Transformer

作者:Andrea Valenti,Stefano Berti,Davide Bacciu 机构:University of Pisa - Dept. of Computer Science, Largo B. Pontecorvo, Pisa - Italy 备注:Accepted at ESANN2021 链接:https://arxiv.org/abs/2107.05546 摘要:音乐的复调性使深度学习在音乐造型中的应用成为一项具有挑战性的任务。另一方面,Transformer架构似乎很适合这种数据。在这项工作中,我们提出Calliope,一种新的基于Transformer的自动编码器模型,用于有效地模拟多声道的复调音乐序列。实验表明,该模型能够提高音乐序列重构和生成的技术水平,特别是对长序列的重构和生成效果显著。 摘要:The polyphonic nature of music makes the application of deep learning to music modelling a challenging task. On the other hand, the Transformer architecture seems to be a good fit for this kind of data. In this work, we present Calliope, a novel autoencoder model based on Transformers for the efficient modelling of multi-track sequences of polyphonic music. The experiments show that our model is able to improve the state of the art on musical sequence reconstruction and generation, with remarkably good results especially on long sequences.

【2】 The Brownian motion in the transformer model 标题:Transformer模型中的布朗运动

作者:Yingshi Chen 备注:9 pages 链接:https://arxiv.org/abs/2107.05264 摘要:Transformer是许多语言和视觉任务的最先进模型。本文对其多头部自我注意(MHSA)模块进行了深入分析,发现:1)每个标记在高维特征空间中都是一个随机变量。2) 在层标准化之后,这些变量被映射到超球体上的点。3) 这些标记的更新是布朗运动。布朗运动具有特殊的性质,其二阶项不可忽略。因此,我们提出了一种新的二阶优化算法(迭代K-FAC算法)。简而言之:所有标记都映射到高维超球体。标度点积Attention$softmax(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d}})$就是球面上随机游动的马尔可夫转移矩阵。深度学习过程将学习适当的核函数以获得这些标记的适当位置。MHSA模块中的训练过程对应一个值得进一步研究的布朗运动。 摘要:Transformer is the state of the art model for many language and visual tasks. In this paper, we give a deep analysis of its multi-head self-attention (MHSA) module and find that: 1) Each token is a random variable in high dimensional feature space. 2) After layer normalization, these variables are mapped to points on the hyper-sphere. 3) The update of these tokens is a Brownian motion. The Brownian motion has special properties, its second order item should not be ignored. So we present a new second-order optimizer(an iterative K-FAC algorithm) for the MHSA module. In some short words: All tokens are mapped to high dimension hyper-sphere. The Scaled Dot-Product Attention $softmax(\frac{\mathbf{Q}\mathbf{K}^T}{\sqrt{d}})$ is just the Markov transition matrix for the random walking on the sphere. And the deep learning process would learn proper kernel function to get proper positions of these tokens. The training process in the MHSA module corresponds to a Brownian motion worthy of further study.

【3】 TransClaw U-Net: Claw U-Net with Transformers for Medical Image Segmentation 标题:TransClaw U-Net:用于医学图像分割的带Transformer的爪形U-Net

作者:Yao Chang,Hu Menghan,Zhai Guangtao,Zhang Xiao-Ping 机构: Xiao-Ping Zhang 3 1Shanghai Key Laboratory of Multidimensional Information Processing, East China Normal University 2Institute of Image Communication and Information Processing, Shanghai Jiao Tong University 3Department of Electrical 备注:8 page, 3 figures 链接:https://arxiv.org/abs/2107.05188 摘要:近年来,计算机辅助诊断已成为一个日益热门的话题。基于卷积神经网络的方法在医学图像分割和分类中取得了良好的效果。由于卷积运算的局限性,长期的空间特征往往得不到准确的提取。因此,我们提出了一种将卷积运算和变换运算结合在编码部分的TransClaw U-Net网络结构。卷积部分用于提取浅层空间特征,便于上采样后图像分辨率的恢复。利用变换器部分对补丁进行编码,利用自注意机制获取序列间的全局信息。解码部分保留了自底向上采样结构,以获得更好的细节分割性能。在Synapse多器官分割数据集上的实验结果表明,transclawu-Net的性能优于其他网络结构。烧蚀实验也证明了U-Net的泛化性能。 摘要:In recent years, computer-aided diagnosis has become an increasingly popular topic. Methods based on convolutional neural networks have achieved good performance in medical image segmentation and classification. Due to the limitations of the convolution operation, the long-term spatial features are often not accurately obtained. Hence, we propose a TransClaw U-Net network structure, which combines the convolution operation with the transformer operation in the encoding part. The convolution part is applied for extracting the shallow spatial features to facilitate the recovery of the image resolution after upsampling. The transformer part is used to encode the patches, and the self-attention mechanism is used to obtain global information between sequences. The decoding part retains the bottom upsampling structure for better detail segmentation performance. The experimental results on Synapse Multi-organ Segmentation Datasets show that the performance of TransClaw U-Net is better than other network structures. The ablation experiments also prove the generalization performance of TransClaw U-Net.

【4】 Transformers with multi-modal features and post-fusion context for e-commerce session-based recommendation 标题:具有多模态特征和融合后上下文的转换器,用于电子商务基于会话的推荐

作者:Gabriel de Souza P. Moreira,Sara Rabhi,Ronay Ak,Md Yasin Kabir,Even Oldridge 机构:NVIDIA, São Paulo, Brazil, Ontario, Canada, Florida, United States, Missouri, United States, British Columbia, Canada 备注:In Proceedings of SIGIR eCom'21 - SIGIR eCommerce Workshop Data Challenge 2021. this https URL 链接:https://arxiv.org/abs/2107.05124 摘要:基于会话的推荐是电子商务服务中的一项重要任务,大量用户匿名浏览或者对不同的会话有不同的兴趣。在本文中,我们为sigir2021电子商务数据挑战研讨会的推荐任务提供了一个成功的解决方案。我们的解决方案受到NLP技术的启发,由两个Transformer体系结构(Transformer XL和XLNet)组成,并使用自回归和自动编码方法进行训练。为了充分利用大部分可用于竞赛的丰富数据集,我们描述了如何通过将表格事件与文本和图像向量相结合来准备多模型特征。我们还提供了一个模型预测分析,以便更好地理解我们的体系结构对于基于会话的推荐的有效性。 摘要:Session-based recommendation is an important task for e-commerce services, where a large number of users browse anonymously or may have very distinct interests for different sessions. In this paper we present one of the winning solutions for the Recommendation task of the SIGIR 2021 Workshop on E-commerce Data Challenge. Our solution was inspired by NLP techniques and consists of an ensemble of two Transformer architectures - Transformer-XL and XLNet - trained with autoregressive and autoencoding approaches. To leverage most of the rich dataset made available for the competition, we describe how we prepared multi-model features by combining tabular events with textual and image vectors. We also present a model prediction analysis to better understand the effectiveness of our architectures for the session-based recommendation.

GAN|对抗|攻击|生成相关(10篇)

【1】 Automated Label Generation for Time Series Classification with Representation Learning: Reduction of Label Cost for Training 标题:基于表示学习的时间序列分类标签自动生成:降低训练标签成本

作者:Soma Bandyopadhyay,Anish Datta,Arpan Pal 机构:TCS Research, TATA Consultancy Services ,Kolkata, India 备注:8 pages, 5 figures, 3 tables accepted in IJCAI2021 Weakly Supervised Representation Learning (WSRL) Workshop ; this https URL 链接:https://arxiv.org/abs/2107.05458 摘要:终端用户、边缘设备和不同的可穿戴设备生成的时间序列大多没有标记。提出了一种利用极少数有代表性的有标签时间序列自动生成无标签时间序列标签的方法。我们的方法是基于表示学习使用自动编码紧凑序列(AECS)与选择最佳的距离测量。它通过学习潜在结构,在迭代中进行自校正,并利用变分自动编码器(VAE)对代表性时间序列进行综合增强,以提高标签的质量。我们实验了UCR和UCI档案,公共现实世界的单变量,多变量时间序列取自不同的应用领域。实验结果表明,该方法与完全监督分类的性能非常接近。该方法不仅产生接近基准的结果,而且在某些情况下优于基准性能。 摘要:Time-series generated by end-users, edge devices, and different wearables are mostly unlabelled. We propose a method to auto-generate labels of un-labelled time-series, exploiting very few representative labelled time-series. Our method is based on representation learning using Auto Encoded Compact Sequence (AECS) with a choice of best distance measure. It performs self-correction in iterations, by learning latent structure, as well as synthetically boosting representative time-series using Variational-Auto-Encoder (VAE) to improve the quality of labels. We have experimented with UCR and UCI archives, public real-world univariate, multivariate time-series taken from different application domains. Experimental results demonstrate that the proposed method is very close to the performance achieved by fully supervised classification. The proposed method not only produces close to benchmark results but outperforms the benchmark performance in some cases.

【2】 Prb-GAN: A Probabilistic Framework for GAN Modelling 标题:PRB-GAN:GaN建模的概率框架

作者:Blessen George,Vinod K. Kurmi,Vinay P. Namboodiri 机构:Indian Institute of Technology Kanpur, Kanpur, India 链接:https://arxiv.org/abs/2107.05241 摘要:生成性对抗网络(Generative敌对网络,GANs)是一种非常流行的生成真实感图像的网络,但它往往存在训练不稳定和模式丢失的问题。为了在GAN合成数据中获得更大的多样性,解决模丢失问题至关重要。我们的工作探讨了概率方法的GAN建模,可以让我们解决这些问题。我们提出了一种新的变分算法Prb-GANs,它利用dropout在网络参数上建立一个分布,并通过变分推理学习后验参数。我们用简单和复杂的数据集从理论上描述和实验上验证了这种方法的好处。我们使用不确定性度量的概念来研究进一步的改进。通过对GAN各网络的损耗函数进行进一步的修正,我们可以得到GAN性能改善的结果。我们的方法非常简单,只需对现有GAN结构进行很少的修改。 摘要:Generative adversarial networks (GANs) are very popular to generate realistic images, but they often suffer from the training instability issues and the phenomenon of mode loss. In order to attain greater diversity in GAN synthesized data, it is critical to solving the problem of mode loss. Our work explores probabilistic approaches to GAN modelling that could allow us to tackle these issues. We present Prb-GANs, a new variation that uses dropout to create a distribution over the network parameters with the posterior learnt using variational inference. We describe theoretically and validate experimentally using simple and complex datasets the benefits of such an approach. We look into further improvements using the concept of uncertainty measures. Through a set of further modifications to the loss functions for each network of the GAN, we are able to get results that show the improvement of GAN performance. Our methods are extremely simple and require very little modification to existing GAN architecture.

【3】 Stateful Detection of Model Extraction Attacks 标题:模型提取攻击的状态检测

作者:Soham Pal,Yash Gupta,Aditya Kanade,Shirish Shevade 机构:Indian Institute of Science, Bangalore, nference 链接:https://arxiv.org/abs/2107.05166 摘要:机器学习即服务提供商通过应用程序编程接口(API)向开发人员公开机器学习(ML)模型。最近的工作表明,攻击者可以利用这些API,通过使用自己选择的样本查询这些ML模型,从而提取这些模型的良好近似值。我们提出VarDetect,一个有状态的监视器,它可以跟踪这样一个服务的用户所做的查询的分布,来检测模型提取攻击。VarDetect利用改进的变分自动编码器学习到的潜在分布,将三种类型的攻击者样本从良性样本中鲁棒地分离出来,并成功地为每种类型发出警报。此外,由于VarDetect被部署为一种自动防御机制,提取的替代模型显示出预期的较差性能和可转移性。最后,我们证明了即使是预先知道VarDetect部署的自适应攻击者,也能被它检测到。 摘要:Machine-Learning-as-a-Service providers expose machine learning (ML) models through application programming interfaces (APIs) to developers. Recent work has shown that attackers can exploit these APIs to extract good approximations of such ML models, by querying them with samples of their choosing. We propose VarDetect, a stateful monitor that tracks the distribution of queries made by users of such a service, to detect model extraction attacks. Harnessing the latent distributions learned by a modified variational autoencoder, VarDetect robustly separates three types of attacker samples from benign samples, and successfully raises an alarm for each. Further, with VarDetect deployed as an automated defense mechanism, the extracted substitute models are found to exhibit poor performance and transferability, as intended. Finally, we demonstrate that even adaptive attackers with prior knowledge of the deployment of VarDetect, are detected by it.

【4】 Attack Rules: An Adversarial Approach to Generate Attacks for Industrial Control Systems using Machine Learning 标题:攻击规则:一种利用机器学习生成工控系统攻击的对抗性方法

作者:Muhammad Azmi Umer,Chuadhry Mujeeb Ahmed,Muhammad Taha Jilani,Aditya P. Mathur 机构:DHA Suffa University, Karachi Institute of Economics and Technology, University of Strathclyde, Singapore University of Technology and Design 链接:https://arxiv.org/abs/2107.05127 摘要:对抗性学习用于测试机器学习算法在攻击下的鲁棒性,并产生欺骗工业控制系统(ICS)中异常检测方法的攻击。鉴于ICS的安全评估需要研究一组可能的攻击模式,本文提出了一种基于关联规则挖掘的攻击生成技术。这项技术已经实施使用数据从一个安全的水处理厂。提出的技术能够产生超过30万种攻击模式,构成了以前从未见过的绝大多数新攻击向量。自动生成的攻击提高了我们对潜在攻击的理解,使得设计健壮的攻击检测技术成为可能。 摘要:Adversarial learning is used to test the robustness of machine learning algorithms under attack and create attacks that deceive the anomaly detection methods in Industrial Control System (ICS). Given that security assessment of an ICS demands that an exhaustive set of possible attack patterns is studied, in this work, we propose an association rule mining-based attack generation technique. The technique has been implemented using data from a secure Water Treatment plant. The proposed technique was able to generate more than 300,000 attack patterns constituting a vast majority of new attack vectors which were not seen before. Automatically generated attacks improve our understanding of the potential attacks and enable the design of robust attack detection techniques.

【5】 Out of Distribution Detection and Adversarial Attacks on Deep Neural Networks for Robust Medical Image Analysis 标题:用于鲁棒医学图像分析的深度神经网络的偏离分布检测和敌意攻击

作者:Anisie Uwimana1,Ransalu Senanayake 机构: as alternatives to poly-merase chain reaction (PCR) tests and rapid diagnostic tests 1African Institute for Mathematical Sciences (AIMS) 链接:https://arxiv.org/abs/2107.04882 摘要:深度学习模型已经成为医学图像分析的热门选择。然而,深度学习模型的泛化性能较差,限制了其在现实世界中的应用,因为鲁棒性对于医学应用至关重要。例如,最先进的卷积神经网络(CNNs)无法检测对抗性样本或统计上远离训练分布的样本。在这项工作中,我们通过实验评估了基于马氏距离的置信度评分(Mahalanobis distance-based confidence score)在疟疾寄生细胞和未感染细胞分类中的稳健性,这是一种检测异常输入样本的简单而有效的方法。结果表明,马氏置信分数检测器在深度学习模型中表现出更好的性能和鲁棒性,并且在非分布(OOD)和对抗性样本上都达到了最先进的性能。 摘要:Deep learning models have become a popular choice for medical image analysis. However, the poor generalization performance of deep learning models limits them from being deployed in the real world as robustness is critical for medical applications. For instance, the state-of-the-art Convolutional Neural Networks (CNNs) fail to detect adversarial samples or samples drawn statistically far away from the training distribution. In this work, we experimentally evaluate the robustness of a Mahalanobis distance-based confidence score, a simple yet effective method for detecting abnormal input samples, in classifying malaria parasitized cells and uninfected cells. Results indicated that the Mahalanobis confidence score detector exhibits improved performance and robustness of deep learning models, and achieves stateof-the-art performance on both out-of-distribution (OOD) and adversarial samples.

【6】 Identifying Layers Susceptible to Adversarial Attacks 标题:识别易受对抗性攻击的层

作者:Shoaib Ahmed Siddiqui,Thomas Breuel 机构:German Research Center for Artificial Intelligence (DFKI), TU Kaiserslautern, NVIDIA Research 链接:https://arxiv.org/abs/2107.04827 摘要:常见的神经网络结构容易受到敌对样本的攻击。神经网络结构通常被认为分为低级特征提取层和高级分类层;网络对敌方样本的敏感性通常被认为是与分类有关的问题,而不是与特征提取有关的问题。我们通过在CIFAR-10、Imagenette和ImageNet上使用非对抗性和对抗性数据有选择地重新训练VGG和ResNet架构的不同部分来测试这个想法。我们的实验结果表明,对抗性样本的敏感性与低水平的特征提取层有关。因此,再训练高级层不足以实现健壮性。这种现象可能有两种解释:要么,敌对攻击产生早期层次的输出,与攻击类中的特征无法区分,或者,对抗性攻击会产生早期层的输出,这些输出在统计上与非对抗性样本的特征不同,并且不允许后续层进行一致的分类。通过对隐藏层特征向量分布的大规模非线性降维和密度建模,我们发现非对抗性样本和对抗性样本的特征分布有很大差异。我们的结果为对抗性样本的统计起源和可能的防御提供了新的见解。 摘要:Common neural network architectures are susceptible to attack by adversarial samples. Neural network architectures are commonly thought of as divided into low-level feature extraction layers and high-level classification layers; susceptibility of networks to adversarial samples is often thought of as a problem related to classification rather than feature extraction. We test this idea by selectively retraining different portions of VGG and ResNet architectures on CIFAR-10, Imagenette and ImageNet using non-adversarial and adversarial data. Our experimental results show that susceptibility to adversarial samples is associated with low-level feature extraction layers. Therefore, retraining high-level layers is insufficient for achieving robustness. This phenomenon could have two explanations: either, adversarial attacks yield outputs from early layers that are indistinguishable from features found in the attack classes, or adversarial attacks yield outputs from early layers that differ statistically from features for non-adversarial samples and do not permit consistent classification by subsequent layers. We test this question by large-scale non-linear dimensionality reduction and density modeling on distributions of feature vectors in hidden layers and find that the feature distributions between non-adversarial and adversarial samples differ substantially. Our results provide new insights into the statistical origins of adversarial samples and possible defenses.

【7】 A Framework and Benchmarking Study for Counterfactual Generating Methods on Tabular Data 标题:表格数据反事实生成方法的框架与标杆研究

作者:Raphael Mazzine,David Martens 链接:https://arxiv.org/abs/2107.04680 摘要:反事实解释被认为是解释机器学习预测的有效方法。这种兴趣反映在一个相对年轻的文献中,已经有几十种算法旨在产生这样的解释。这些算法的重点是发现如何修改特征来改变输出分类。然而,这个相当普遍的目标可以通过不同的方式实现,这就需要一种方法来测试和测试这些算法。这项工作的贡献是多方面的:首先,对22个表格数据集上的10种算法方法进行了大型基准研究,使用了9个相关的评估指标。第二,介绍了一种新颖的、首创的、用于测试反事实生成算法的框架。第三,一套客观指标来评估和比较反事实的结果。最后,从基准测试结果中可以看出哪些方法在哪种类型的数据集上获得最佳性能。这种基准研究和框架可以帮助实践者确定哪种技术和构建块最适合他们的环境,并且可以帮助研究人员设计和评估当前和未来的反事实生成算法。我们的研究结果表明,总体而言,没有一个最佳的算法来生成反事实的解释,因为性能高度依赖于与数据集、模型、分数和事实点特定性相关的属性。 摘要:Counterfactual explanations are viewed as an effective way to explain machine learning predictions. This interest is reflected by a relatively young literature with already dozens of algorithms aiming to generate such explanations. These algorithms are focused on finding how features can be modified to change the output classification. However, this rather general objective can be achieved in different ways, which brings about the need for a methodology to test and benchmark these algorithms. The contributions of this work are manifold: First, a large benchmarking study of 10 algorithmic approaches on 22 tabular datasets is performed, using 9 relevant evaluation metrics. Second, the introduction of a novel, first of its kind, framework to test counterfactual generation algorithms. Third, a set of objective metrics to evaluate and compare counterfactual results. And finally, insight from the benchmarking results that indicate which approaches obtain the best performance on what type of dataset. This benchmarking study and framework can help practitioners in determining which technique and building blocks most suit their context, and can help researchers in the design and evaluation of current and future counterfactual generation algorithms. Our findings show that, overall, there's no single best algorithm to generate counterfactual explanations as the performance highly depends on properties related to the dataset, model, score and factual point specificities.

【8】 Diverse Video Generation using a Gaussian Process Trigger 标题:使用高斯过程触发器的多样化视频生成

作者:Gaurav Shrivastava,Abhinav Shrivastava 机构:University of Maryland, College Park 备注:International Conference on Learning Representations, 2021 链接:https://arxiv.org/abs/2107.04619 摘要:在给定一些上下文(或过去)帧的情况下生成未来帧是一项具有挑战性的任务。它需要对视频的时间一致性进行建模,并根据潜在未来状态的多样性对多模态进行建模。当前用于视频生成的变分方法倾向于在多模式的未来结果上边缘化。相反,我们建议在未来的结果中显式地建模多模态,并利用它来抽样不同的未来。我们的方法,多样化的视频发生器,使用高斯过程(GP)来学习给定过去的未来状态的先验知识,并在给定特定样本的可能未来保持概率分布。此外,我们利用这种分布随时间的变化,通过估计正在进行的序列的结束来控制未来不同状态的采样。也就是说,我们使用GP在输出函数空间上的方差来触发动作序列中的变化。在重建质量和生成序列的多样性方面,我们在不同的未来帧生成方面取得了最新的成果。 摘要:Generating future frames given a few context (or past) frames is a challenging task. It requires modeling the temporal coherence of videos and multi-modality in terms of diversity in the potential future states. Current variational approaches for video generation tend to marginalize over multi-modal future outcomes. Instead, we propose to explicitly model the multi-modality in the future outcomes and leverage it to sample diverse futures. Our approach, Diverse Video Generator, uses a Gaussian Process (GP) to learn priors on future states given the past and maintains a probability distribution over possible futures given a particular sample. In addition, we leverage the changes in this distribution over time to control the sampling of diverse future states by estimating the end of ongoing sequences. That is, we use the variance of GP over the output function space to trigger a change in an action sequence. We achieve state-of-the-art results on diverse future frame generation in terms of reconstruction quality and diversity of the generated sequences.

【9】 Perceptual-based deep-learning denoiser as a defense against adversarial attacks on ASR systems 标题:基于感知的深度学习去噪器作为对ASR系统敌意攻击的防御

作者:Anirudh Sreeram,Nicholas Mehlman,Raghuveer Peri,Dillon Knox,Shrikanth Narayanan 机构:Signal Analysis and Interpretation Laboratory (SAIL), Ming Hsieh Department of Electrical and Computer Engineering, Viterbi school of Engineering, University of Southern California 备注:5 pages, 4 figures submitted to ASRU 2021 链接:https://arxiv.org/abs/2107.05222 摘要:在本文中,我们研究了语音去噪作为一种防御攻击的自动语音识别(ASR)系统。对抗性攻击试图通过在原始语音信号中添加小扰动来强制错误分类。我们建议通过在ASR流水线中使用基于神经网络的去噪器作为预处理器来抵消这一点。去噪器独立于下游ASR模型,因此可以在现有系统中快速部署。我们发现使用感知激励损失函数训练去噪器可以提高对抗鲁棒性,而不会影响ASR在良性样本上的性能。我们的防御被评估(作为DARPA GARD项目的一部分)在一系列攻击强度和言语样本的“凯南维尔”攻击策略上。在信噪比(SNR)为20db的攻击强度下,该模型的误字率平均提高了7.7%。 摘要:In this paper we investigate speech denoising as a defense against adversarial attacks on automatic speech recognition (ASR) systems. Adversarial attacks attempt to force misclassification by adding small perturbations to the original speech signal. We propose to counteract this by employing a neural-network based denoiser as a pre-processor in the ASR pipeline. The denoiser is independent of the downstream ASR model, and thus can be rapidly deployed in existing systems. We found that training the denoisier using a perceptually motivated loss function resulted in increased adversarial robustness without compromising ASR performance on benign samples. Our defense was evaluated (as a part of the DARPA GARD program) on the 'Kenansville' attack strategy across a range of attack strengths and speech samples. An average improvement in Word Error Rate (WER) of about 7.7% was observed over the undefended model at 20 dB signal-to-noise-ratio (SNR) attack strength.

【10】 Generating stable molecules using imitation and reinforcement learning 标题:利用模仿和强化学习生成稳定的分子

作者:Søren Ager Meldgaard,Jonas Köhler,Henrik Lund Mortensen,Mads-Peter V. Christiansen,Frank Noé,Bjørk Hammer 机构:)InterCat and Department of Physics and Astronomy, Aarhus University, Denmark., )Freie Universit¨at Berlin, Department of Mathematics and Computer Science, Berlin, )Freie Universit¨at Berlin, Department of Physics, Berlin, Germany 链接:https://arxiv.org/abs/2107.05007 摘要:在进行耗时的实验合成之前,化学空间通常通过机器学习方法来探索,以发现有趣的分子。然而,这些方法通常依赖于图形表示,忽略了确定分子稳定性所需的三维信息。我们提出了一种在笛卡尔坐标系下生成分子的强化学习方法,允许对稳定性进行量子化学预测。为了提高样品效率,我们从GDB-11数据库的模拟学习中学习基本的化学规则,从而建立一个适用于所有化学计量的初始模型。然后,我们在强化学习环境中部署多个以特定化学计量为条件的模型副本。该模型能正确识别数据库中的低能分子,并产生训练集中未发现的新异构体。最后,我们将此模型应用于大分子,以说明强化学习如何在远离训练数据的领域进一步完善模仿学习模型。 摘要:Chemical space is routinely explored by machine learning methods to discover interesting molecules, before time-consuming experimental synthesizing is attempted. However, these methods often rely on a graph representation, ignoring 3D information necessary for determining the stability of the molecules. We propose a reinforcement learning approach for generating molecules in cartesian coordinates allowing for quantum chemical prediction of the stability. To improve sample-efficiency we learn basic chemical rules from imitation learning on the GDB-11 database to create an initial model applicable for all stoichiometries. We then deploy multiple copies of the model conditioned on a specific stoichiometry in a reinforcement learning setting. The models correctly identify low energy molecules in the database and produce novel isomers not found in the training set. Finally, we apply the model to larger molecules to show how reinforcement learning further refines the imitation learning model in domains far from the training data.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Active Divergence with Generative Deep Learning -- A Survey and Taxonomy 标题:生成性深度学习的主动发散--综述与分类

作者:Terence Broad,Sebastian Berns,Simon Colton,Mick Grierson 机构: Department of Computing, Goldsmiths, University of London, UK, Creative Computing Institute, University of The Arts London, UK, School of Electronic Engineering and Computer Science, Queen Mary University of London, UK 链接:https://arxiv.org/abs/2107.05599 摘要:生成性深度学习系统为人工制品的生成提供了强大的工具,因为它们能够对数据的分布进行建模并生成高保真的结果。然而,在计算创造性的背景下,一个主要的缺点是它们不能以创造性的方式显式地偏离训练数据,并且仅限于拟合目标数据分布。为了解决这些局限性,有越来越多的方法对这些模型进行优化、黑客攻击和重写,以便主动偏离训练数据。我们对主动发散技术的现状进行了分类和全面的调查,强调了计算创造力研究人员在真正创造性系统中推进这些方法和使用深层生成模型的潜力。 摘要:Generative deep learning systems offer powerful tools for artefact generation, given their ability to model distributions of data and generate high-fidelity results. In the context of computational creativity, however, a major shortcoming is that they are unable to explicitly diverge from the training data in creative ways and are limited to fitting the target data distribution. To address these limitations, there have been a growing number of approaches for optimising, hacking and rewriting these models in order to actively diverge from the training data. We present a taxonomy and comprehensive survey of the state of the art of active divergence techniques, highlighting the potential for computational creativity researchers to advance these methods and use deep generative models in truly creative systems.

【2】 End-to-End Rich Transcription-Style Automatic Speech Recognition with Semi-Supervised Learning 标题:基于半监督学习的端到端富转录风格自动语音识别

作者:Tomohiro Tanaka,Ryo Masumura,Mana Ihori,Akihiko Takashima,Shota Orihashi,Naoki Makishima 机构:NTT Media Intelligence Laboratories, NTT Corporation 备注:Accepted at Interspeech 2021 链接:https://arxiv.org/abs/2107.05382 摘要:我们提出了一种半监督学习方法,用于从小型富转录风格和大型通用转录风格数据集构建端到端富转录风格自动语音识别(RT-ASR)系统。在自发的言语任务中,各种言语现象,如填充词、词片段、笑声和咳嗽等,常常被包括在内。虽然普通的转录并没有给予这些现象特别的意识,但丰富的转录明确地将它们转换为特殊的现象标记以及文本标记。在以往的研究中,文本标记和现象标记是以端到端的方式同时被估计的。然而,由于缺乏大规模的、丰富的转录风格的数据集,很难建立精确的RT-ASR系统。为了解决这个问题,我们的训练方法同时使用有限的丰富转录风格数据集和普通转录风格数据集。半监督学习的关键是将普通的转录风格数据集转化为伪富转录风格数据集。为此,我们将控制现象标记生成与否的样式标记引入到基于Transformer的自回归建模中。我们使用这个模型来生成伪富转录风格的数据集,并从伪和原始数据集构建RT-ASR系统。在自发ASR任务上的实验表明了该方法的有效性。 摘要:We propose a semi-supervised learning method for building end-to-end rich transcription-style automatic speech recognition (RT-ASR) systems from small-scale rich transcription-style and large-scale common transcription-style datasets. In spontaneous speech tasks, various speech phenomena such as fillers, word fragments, laughter and coughs, etc. are often included. While common transcriptions do not give special awareness to these phenomena, rich transcriptions explicitly convert them into special phenomenon tokens as well as textual tokens. In previous studies, the textual and phenomenon tokens were simultaneously estimated in an end-to-end manner. However, it is difficult to build accurate RT-ASR systems because large-scale rich transcription-style datasets are often unavailable. To solve this problem, our training method uses a limited rich transcription-style dataset and common transcription-style dataset simultaneously. The Key process in our semi-supervised learning is to convert the common transcription-style dataset into a pseudo-rich transcription-style dataset. To this end, we introduce style tokens which control phenomenon tokens are generated or not into transformer-based autoregressive modeling. We use this modeling for generating the pseudo-rich transcription-style datasets and for building RT-ASR system from the pseudo and original datasets. Our experiments on spontaneous ASR tasks showed the effectiveness of the proposed method.

【3】 DISCO : efficient unsupervised decoding for discrete natural language problems via convex relaxation 标题:DISCO:基于凸松弛的离散自然语言问题的高效无监督解码

作者:Anish Acharya,Rudrajit Das,Greg Durrett,Inderjit Dhillon,Sujay Sanghavi 机构:University of Texas at Austin, Amazon 链接:https://arxiv.org/abs/2107.05380 摘要:本文研究了测试时间译码;在几乎所有的连续文本生成任务中,一个普遍存在的步骤跨越了一系列自然语言处理(NLP)问题。我们的主要贡献是为组合NP-hard译码问题建立了一个连续松弛框架,并提出了一种基于标准一阶梯度的高效算法Disco。我们提供了严密的分析,并表明我们提出的算法线性收敛到$\epsilon$附近的最优解。最后,我们对敌方文本生成任务进行了初步的实验,结果表明Disco的性能优于几种常用的解码方法。 摘要:In this paper we study test time decoding; an ubiquitous step in almost all sequential text generation task spanning across a wide array of natural language processing (NLP) problems. Our main contribution is to develop a continuous relaxation framework for the combinatorial NP-hard decoding problem and propose Disco - an efficient algorithm based on standard first order gradient based. We provide tight analysis and show that our proposed algorithm linearly converges to within $\epsilon$ neighborhood of the optima. Finally, we perform preliminary experiments on the task of adversarial text generation and show superior performance of Disco over several popular decoding approaches.

【4】 Self-service Data Classification Using Interactive Visualization and Interpretable Machine Learning 标题:基于交互式可视化和可解释机器学习的自助式数据分类

作者:Sridevi Narayana Wagle,Boris Kovalerchuk 机构:Department of Computer Science, Central Washington University, USA 备注:37 pages, 33 figures, 7 tables 链接:https://arxiv.org/abs/2107.04971 摘要:机器学习算法通常产生被最终用户和开发人员视为复杂黑盒模型的模型。他们无法从设计的领域来解释模型。提出的迭代视觉逻辑分类器(IVLC)是一种可解释的机器学习算法,它允许最终用户设计一个模型并对数据进行分类,具有更高的可信度,而且不必牺牲准确性。这种技术特别有助于处理敏感和关键的数据,如癌症数据在医疗领域的高成本的错误。借助于所提出的交互式无损多维可视化,最终用户可以识别数据中的模式,并据此做出可解释的决策。这种选择在黑箱机器学习方法中是不可能的。交互式移位成对坐标软件系统(SPCVis)支持可解释IVLC算法。它是一个具有用户交互功能的无损多维数据可视化系统。交互式方法为最终用户提供了灵活性,使其能够以自助方式执行数据分类,而不必依赖于机器学习专家。在处理具有数百个维度/特征的大型数据集时,交互式模式发现变得很有挑战性。为了克服这个问题,本章提出了一种结合新的坐标顺序优化算法和遗传算法的自动分类方法。COO算法自动生成最能代表数据分离的坐标对序列,遗传算法通过自动生成数据分类区域来优化IVLC算法。实验结果表明,该方法是可行的,包括用于数据分类的交互式和自动化过程的基准数据集。 摘要:Machine learning algorithms often produce models considered as complex black-box models by both end users and developers. They fail to explain the model in terms of the domain they are designed for. The proposed Iterative Visual Logical Classifier (IVLC) is an interpretable machine learning algorithm that allows end users to design a model and classify data with more confidence and without having to compromise on the accuracy. Such technique is especially helpful when dealing with sensitive and crucial data like cancer data in the medical domain with high cost of errors. With the help of the proposed interactive and lossless multidimensional visualization, end users can identify the pattern in the data based on which they can make explainable decisions. Such options would not be possible in black box machine learning methodologies. The interpretable IVLC algorithm is supported by the Interactive Shifted Paired Coordinates Software System (SPCVis). It is a lossless multidimensional data visualization system with user interactive features. The interactive approach provides flexibility to the end user to perform data classification as self-service without having to rely on a machine learning expert. Interactive pattern discovery becomes challenging while dealing with large data sets with hundreds of dimensions/features. To overcome this problem, this chapter proposes an automated classification approach combined with new Coordinate Order Optimizer (COO) algorithm and a Genetic algorithm. The COO algorithm automatically generates the coordinate pair sequences that best represent the data separation and the genetic algorithm helps optimizing the proposed IVLC algorithm by automatically generating the areas for data classification. The feasibility of the approach is shown by experiments on benchmark datasets covering both interactive and automated processes used for data classification.

【5】 ReconVAT: A Semi-Supervised Automatic Music Transcription Framework for Low-Resource Real-World Data 标题:一种面向低资源真实数据的半监督自动音乐改编框架--RECORVAT

作者:Kin Wai Cheuk,Dorien Herremans,Li Su 机构:Information Systems Technology and Design, Singapore University of Technology and Design, Institute of Information Science, Academia Sinica, Taiwan 备注:Accepted in ACMMM 21 链接:https://arxiv.org/abs/2107.04954 摘要:现有的有监督音乐自动抄写(AMT)模型大多缺乏泛化能力。这意味着他们很难从不同的音乐类型中转录真实世界的音乐录音,而这些音乐类型并没有出现在标记的训练数据中。在本文中,我们提出了一个半监督框架ReconVAT,它通过利用大量可用的未标记音乐记录来解决这个问题。该方法采用了重建损失法和虚拟对抗训练法。当与现有的AMT U-net模型相结合时,ReconVAT在MAPS和MusicNet等通用基准数据集上取得了有竞争力的结果。例如,在MusicNet的弦乐部分版本的少数镜头设置中,ReconVAT在按音符和按偏移量的音符中分别获得61.0%和41.6%的F1分数,与监督基线模型相比,这意味着提高了22.2%和62.5%。我们提出的框架还展示了对新数据进行持续学习的潜力,这在不断提供新数据的实际应用中可能很有用。 摘要:Most of the current supervised automatic music transcription (AMT) models lack the ability to generalize. This means that they have trouble transcribing real-world music recordings from diverse musical genres that are not presented in the labelled training data. In this paper, we propose a semi-supervised framework, ReconVAT, which solves this issue by leveraging the huge amount of available unlabelled music recordings. The proposed ReconVAT uses reconstruction loss and virtual adversarial training. When combined with existing U-net models for AMT, ReconVAT achieves competitive results on common benchmark datasets such as MAPS and MusicNet. For example, in the few-shot setting for the string part version of MusicNet, ReconVAT achieves F1-scores of 61.0% and 41.6% for the note-wise and note-with-offset-wise metrics respectively, which translates into an improvement of 22.2% and 62.5% compared to the supervised baseline model. Our proposed framework also demonstrates the potential of continual learning on new data, which could be useful in real-world applications whereby new data is constantly available.

【6】 Learn from Anywhere: Rethinking Generalized Zero-Shot Learning with Limited Supervision 标题:无处不在的学习:有限监督下的广义零射学习的再思考

作者:Gaurav Bhatt,Shivam Chandok,Vineeth N Balasubramanian 机构:IIT Hyderabad 链接:https://arxiv.org/abs/2107.04952 摘要:大多数Zero-Shot学习方法和少量镜头学习方法的一个常见问题是,它们对所看到的类有偏见,从而导致次优性能。现有的工作旨在利用未标记的图像从看不见的类(即传Zero-Shot)在训练过程中,使泛化。然而,这限制了它们在实际场景中的使用,在实际场景中,来自目标不可见类的数据不可用或无法收集。在这项工作中,我们提出了一个实用的归纳零和Few-Shot学习设置,其中来自其他数据类的未标记图像,不属于可见或不可见的类别,可以用来提高任何镜头学习的泛化。我们利用了一个基于专家产品的公式,并引入了一个新的AUD模块,使我们能够使用数据类之外的未标记样本,这些数据类通常很容易获得,并且实际上不需要任何注释成本。此外,我们还证明了我们的模型的适用性,以解决更实际和更具挑战性的,在有限的监督设置下,即使是基类看到没有足够的注释样本的广义零炮。 摘要:A common problem with most zero and few-shot learning approaches is they suffer from bias towards seen classes resulting in sub-optimal performance. Existing efforts aim to utilize unlabeled images from unseen classes (i.e transductive zero-shot) during training to enable generalization. However, this limits their use in practical scenarios where data from target unseen classes is unavailable or infeasible to collect. In this work, we present a practical setting of inductive zero and few-shot learning, where unlabeled images from other out-of-data classes, that do not belong to seen or unseen categories, can be used to improve generalization in any-shot learning. We leverage a formulation based on product-of-experts and introduce a new AUD module that enables us to use unlabeled samples from out-of-data classes which are usually easily available and practically entail no annotation cost. In addition, we also demonstrate the applicability of our model to address a more practical and challenging, Generalized Zero-shot under a limited supervision setting, where even base seen classes do not have sufficient annotated samples.

【7】 Semi-Supervised Learning with Multi-Head Co-Training 标题:多头联合训练的半监督学习

作者:Mingcai Chen,Yuntao Du,Yi Zhang,Shuwei Qian,Chongjun Wang 机构:State Key Laboratory for Novel Software Technology at Nanjing University, Nanjing University, Nanjing , China 链接:https://arxiv.org/abs/2107.04795 摘要:协同训练是半监督学习的框架之一,是自学习的延伸。它是以训练额外的分类器为代价的,在这种情况下,算法应该精心设计,以防止单个分类器相互崩溃。本文提出了一种简单有效的用于半监督图像分类的联合训练算法,即多头联合训练算法。通过将基础学习者整合到一个多头部结构中,该模型只需要最少的额外参数。统一模型中的每一个分类头都通过一种“弱增强和强增强”的策略与同级进行交互,在不显式促进多样性的情况下实现单视图协同训练。通过对标准半监督学习基准的实证研究,证明了多头联合训练的有效性。 摘要:Co-training, extended from self-training, is one of the frameworks for semi-supervised learning. It works at the cost of training extra classifiers, where the algorithm should be delicately designed to prevent individual classifiers from collapsing into each other. In this paper, we present a simple and efficient co-training algorithm, named Multi-Head Co-Training, for semi-supervised image classification. By integrating base learners into a multi-head structure, the model is in a minimal amount of extra parameters. Every classification head in the unified model interacts with its peers through a "Weak and Strong Augmentation" strategy, achieving single-view co-training without promoting diversity explicitly. The effectiveness of Multi-Head Co-Training is demonstrated in an empirical study on standard semi-supervised learning benchmarks.

【8】 Layer-wise Analysis of a Self-supervised Speech Representation Model 标题:一种自监督语音表示模型的分层分析

作者:Ankita Pasad,Ju-Chieh Chou,Karen Livescu 机构:Toyota Technological Institute at Chicago 链接:https://arxiv.org/abs/2107.04734 摘要:最近提出的自监督学习方法已经成功地用于预训练语音表示模型。这些学习到的表征的效用已经被经验地观察到,但是关于在预先训练的表征中编码的信息的类型或程度的研究并不多。开发这样的洞察可以帮助理解这些模型的能力和局限性,并使研究团体能够更有效地开发它们在下游应用程序中的使用。在这项工作中,我们开始填补这一空白,通过检查一个最近成功的预训练模型(wav2vec2.0),通过其中间表示向量,使用一套分析工具。我们使用典型相关、互信息和非参数探测的简单下游任务性能指标,以便(i)查询声学和语言信息内容,(ii)描述跨模型层的信息演化,以及(iii)了解自动语音识别(ASR)模型的微调如何影响这些观察结果。我们的发现促使修改ASR的微调协议,从而在低资源环境下提高字错误率。 摘要:Recently proposed self-supervised learning approaches have been successful for pre-training speech representation models. The utility of these learned representations has been observed empirically, but not much has been studied about the type or extent of information encoded in the pre-trained representations themselves. Developing such insights can help understand the capabilities and limits of these models and enable the research community to more efficiently develop their usage for downstream applications. In this work, we begin to fill this gap by examining one recent and successful pre-trained model (wav2vec 2.0), via its intermediate representation vectors, using a suite of analysis tools. We use the metrics of canonical correlation, mutual information, and performance on simple downstream tasks with non-parametric probes, in order to (i) query for acoustic and linguistic information content, (ii) characterize the evolution of information across model layers, and (iii) understand how fine-tuning the model for automatic speech recognition (ASR) affects these observations. Our findings motivate modifying the fine-tuning protocol for ASR, which produces improved word error rates in a low-resource setting.

迁移|Zero/Few/One-Shot|自适应(6篇)

【1】 Learning and Adaptation in Millimeter-Wave: a Dual Timescale Variational Framework 标题:毫米波中的学习和适应:一个双时间尺度的变分框架

作者:Muddassar Hussain,Nicolo Michelusi 机构: Michelusi is with the School of Electrical 备注:Submitted for publication in IEEE Journal on Selected Areas in Communications 链接:https://arxiv.org/abs/2107.05466 摘要:毫米波车载网络需要大量的波束训练开销来实现窄波束通信。本文提出了一种学习和自适应框架,在该框架中,通信波束的动态特性被学习,然后被用来设计低开销的自适应波束训练:在长时间尺度上,深度递归变分自动编码器(DR-VAE)利用噪声波束训练观测来学习波束动态的概率模型;在短时间尺度上,自适应波束训练过程被描述为部分可观测(PO-)马尔可夫决策过程(MDP),并通过基于点的值迭代(PBVI)进行优化,利用波束训练反馈和DR-VAE提供的最强波束对的概率预测。然后,在连续的学习和适应过程中,通过随机梯度上升,利用波束训练观测来细化DR-VAE。提出的DR-VAE移动性学习框架学习精确的波束动力学:与Baum-Welch算法相比,它将地面真值和学习的波束动力学模型之间的Kullback-Leibler散度降低了86%,与忽略反馈误差的朴素移动性学习方法相比,它降低了92%。所提出的双时间尺度方法产生的频谱效率损失可忽略不计相比,genie辅助方案下运行的无错误反馈和已知的流动性模型。最后,通过将POMDP简化为一个错误鲁棒MDP,提出了一种低复杂度的策略。结果表明,基于PBVI和错误鲁棒MDP的策略比完全扫描主波束对的策略分别提高了85%和67%的频谱效率,比最新的POMDP策略分别提高了16%和7%。 摘要:Millimeter-wave vehicular networks incur enormous beam-training overhead to enable narrow-beam communications. This paper proposes a learning and adaptation framework in which the dynamics of the communication beams are learned and then exploited to design adaptive beam-training with low overhead: on a long-timescale, a deep recurrent variational autoencoder (DR-VAE) uses noisy beam-training observations to learn a probabilistic model of beam dynamics; on a short-timescale, an adaptive beam-training procedure is formulated as a partially observable (PO-) Markov decision process (MDP) and optimized via point-based value iteration (PBVI) by leveraging beam-training feedback and a probabilistic prediction of the strongest beam pair provided by the DR-VAE. In turn, beam-training observations are used to refine the DR-VAE via stochastic gradient ascent in a continuous process of learning and adaptation. The proposed DR-VAE mobility learning framework learns accurate beam dynamics: it reduces the Kullback-Leibler divergence between the ground truth and the learned beam dynamics model by 86% over the Baum-Welch algorithm and by 92\% over a naive mobility learning approach that neglects feedback errors. The proposed dual-timescale approach yields a negligible loss of spectral efficiency compared to a genie-aided scheme operating under error-free feedback and foreknown mobility model. Finally, a low-complexity policy is proposed by reducing the POMDP to an error-robust MDP. It is shown that the PBVI- and error-robust MDP-based policies improve the spectral efficiency by 85% and 67%, respectively, over a policy that scans exhaustively over the dominant beam pairs, and by 16% and 7%, respectively, over a state-of-the-art POMDP policy.

【2】 Source-Free Adaptation to Measurement Shift via Bottom-Up Feature Restoration 标题:通过自下而上的特征恢复实现测量漂移的无源自适应

作者:Cian Eastwood,Ian Mason,Christopher K. I. Williams,Bernhard Schölkopf 机构:† School of Informatics, University of Edinburgh, ‡ Alan Turing Institute, London, § Max Planck Institute for Intelligent Systems, Tübingen 链接:https://arxiv.org/abs/2107.05446 摘要:源自由域自适应(SFDA)的目的是在自适应过程中,在不访问源域数据的情况下,将源域中已标记数据训练的模型自适应到目标域中未标记的数据。现有的SFDA方法利用熵最小化技术:(i)只适用于分类(ii)破坏模型校准;并且(iii)依赖源模型在目标域中实现良好的特征空间类分离。我们针对一种特别普遍的领域转移(称为测量转移)来解决这些问题,其特征是测量系统的变化(例如传感器或照明的变化)。在源域中,我们存储了源数据下特征分布的轻量级和灵活的近似值。在目标域,我们采用特征抽取器,使得目标数据下的近似特征分布与源数据上的近似特征分布重新对齐。我们称这种方法为特征恢复(FR),因为它试图从目标域中提取与先前从源域中提取的语义相同的特征。我们还提出了自底向上的特征恢复(BUFR),这是一种自底向上的特征恢复训练方案,通过在网络的后一层保留学习到的结构来提高性能。通过实验,我们证明了BUFR在精度、校准和数据效率方面往往优于现有的SFDA方法,同时对源模型在目标域的性能依赖性较小。 摘要:Source-free domain adaptation (SFDA) aims to adapt a model trained on labelled data in a source domain to unlabelled data in a target domain without access to the source-domain data during adaptation. Existing methods for SFDA leverage entropy-minimization techniques which: (i) apply only to classification; (ii) destroy model calibration; and (iii) rely on the source model achieving a good level of feature-space class-separation in the target domain. We address these issues for a particularly pervasive type of domain shift called measurement shift, characterized by a change in measurement system (e.g. a change in sensor or lighting). In the source domain, we store a lightweight and flexible approximation of the feature distribution under the source data. In the target domain, we adapt the feature-extractor such that the approximate feature distribution under the target data realigns with that saved on the source. We call this method Feature Restoration (FR) as it seeks to extract features with the same semantics from the target domain as were previously extracted from the source. We additionally propose Bottom-Up Feature Restoration (BUFR), a bottom-up training scheme for FR which boosts performance by preserving learnt structure in the later layers of a network. Through experiments we demonstrate that BUFR often outperforms existing SFDA methods in terms of accuracy, calibration, and data efficiency, while being less reliant on the performance of the source model in the target domain.

【3】 Disentangling Transfer and Interference in Multi-Domain Learning 标题:多域学习中的解缠、转移和干扰

作者:Yipeng Zhang,Tyler L. Hayes,Christopher Kanan 机构: University of Rochester, Rochester, NY, USA, Rochester Institute of Technology, Paige, New York, NY, USA, Cornell Tech 链接:https://arxiv.org/abs/2107.05445 摘要:人类非常善于将知识从一个领域转移到另一个领域,从而能够快速学习新的任务。同样,迁移学习在许多计算机视觉问题的预训练中取得了巨大的成功。然而,在多领域学习中,网络学习由不同数据集定义的多个任务,迁移的好处还没有得到充分的研究。学习多个域可能是有益的,或者在网络容量有限的情况下,这些域可能相互干扰。在这项工作中,我们破译的条件下,干扰和知识转移发生在多领域学习。提出了新的干扰和传输分离指标,并建立了实验协议。我们进一步研究了网络容量、任务分组和动态损失加权在减少干扰和促进传输方面的作用。我们在CIFAR-100、MiniPlaces和微型ImageNet数据集上展示了我们的发现。 摘要:Humans are incredibly good at transferring knowledge from one domain to another, enabling rapid learning of new tasks. Likewise, transfer learning has enabled enormous success in many computer vision problems using pretraining. However, the benefits of transfer in multi-domain learning, where a network learns multiple tasks defined by different datasets, has not been adequately studied. Learning multiple domains could be beneficial or these domains could interfere with each other given limited network capacity. In this work, we decipher the conditions where interference and knowledge transfer occur in multi-domain learning. We propose new metrics disentangling interference and transfer and set up experimental protocols. We further examine the roles of network capacity, task grouping, and dynamic loss weighting in reducing interference and facilitating transfer. We demonstrate our findings on the CIFAR-100, MiniPlaces, and Tiny-ImageNet datasets.

【4】 Propagation-aware Social Recommendation by Transfer Learning 标题:基于迁移学习的传播感知社交推荐

作者:Haodong Chang,Yabo Chu 机构:Propagation-aware Social Recommendation byTransfer LearningHaodong Chang 1[0000−000 2− 50 1 5− 179 3] and Yabo Chu 2[0000−000 2− 169 4−9 179] 1 University of Technology Sydney, au 2 Northeastern University 链接:https://arxiv.org/abs/2107.04846 摘要:社会感知推荐方法是解决传统推荐系统数据稀疏问题的有效方法。其背后的假设是,社交用户连接中的知识可以共享并转移到用户项交互领域,从而帮助了解用户偏好。然而,现有的方法大多只采用迁移学习过程中用户之间的一阶连接,忽略了高阶连接。我们认为,更好的推荐性能也可以受益于高阶社会关系。本文提出了一种基于社会关系传播的传播感知迁移学习网络(PTLN)。我们的目标是更好地挖掘隐藏在社交网络中的共享知识,从而进一步提高推荐性能。具体而言,我们从两个方面来探讨社会影响:(a)高阶朋友已被考虑到的顺序偏见(b) 同一顺序的不同朋友对注意机制的推荐具有不同的重要性。此外,我们还设计了一种新的正则化方法来弥合社交关系和用户项交互之间的鸿沟。我们在两个真实世界的数据集上进行了大量的实验,并在排名准确性方面击败了其他同行,特别是对于历史交互很少的冷启动用户。 摘要:Social-aware recommendation approaches have been recognized as an effective way to solve the data sparsity issue of traditional recommender systems. The assumption behind is that the knowledge in social user-user connections can be shared and transferred to the domain of user-item interactions, whereby to help learn user preferences. However, most existing approaches merely adopt the first-order connections among users during transfer learning, ignoring those connections in higher orders. We argue that better recommendation performance can also benefit from high-order social relations. In this paper, we propose a novel Propagation-aware Transfer Learning Network (PTLN) based on the propagation of social relations. We aim to better mine the sharing knowledge hidden in social networks and thus further improve recommendation performance. Specifically, we explore social influence in two aspects: (a) higher-order friends have been taken into consideration by order bias; (b) different friends in the same order will have distinct importance for recommendation by an attention mechanism. Besides, we design a novel regularization to bridge the gap between social relations and user-item interactions. We conduct extensive experiments on two real-world datasets and beat other counterparts in terms of ranking accuracy, especially for the cold-start users with few historical interactions.

【5】 Detection of Plant Leaf Disease Directly in the JPEG Compressed Domain using Transfer Learning Technique 标题:基于转移学习技术的JPEG压缩域直接检测植物叶部病害

作者:Atul Sharma,Bulla Rajesh,Mohammed Javed 机构:Computer Vision and Biometrics Laboratory (CVBL), Department of Information Technology, Indian Institute of Information Technology Allahabad, Prayagraj, U.P, India 备注:Accepted in MISP 2021 3rd International Conference On Machine Intelligence And Signal Processing 链接:https://arxiv.org/abs/2107.04813 摘要:植物叶片病害对粮食安全构成重大威胁,并导致产量和质量的下降。因此,准确、及时地检测叶片病害,对于遏制农作物的损失,满足人民群众日益增长的粮食需求,具有十分重要的意义。传统的技术依赖于实验室调查和人类技能,而这些通常是昂贵和难以获得的。近年来,深度神经网络在图像分类方面取得了非常丰硕的成果。本文研究了在JPEG压缩域中利用转移学习进行植物叶片病害检测。将DCT系数组成的JPEG压缩流直接输入神经网络,提高了分类效率。在JPEG压缩的树叶数据集上的实验结果验证了该模型的有效性。 摘要:Plant leaf diseases pose a significant danger to food security and they cause depletion in quality and volume of production. Therefore accurate and timely detection of leaf disease is very important to check the loss of the crops and meet the growing food demand of the people. Conventional techniques depend on lab investigation and human skills which are generally costly and inaccessible. Recently, Deep Neural Networks have been exceptionally fruitful in image classification. In this research paper, plant leaf disease detection employing transfer learning is explored in the JPEG compressed domain. Here, the JPEG compressed stream consisting of DCT coefficients is, directly fed into the Neural Network to improve the efficiency of classification. The experimental results on JPEG compressed leaf dataset demonstrate the efficacy of the proposed model.

【6】 A Deep-Bayesian Framework for Adaptive Speech Duration Modification 标题:一种自适应语音时长调整的深度贝叶斯框架

作者:Ravi Shankar,Archana Venkataraman 机构: The authors are with the department of Electrical and ComputerEngineering at the Johns Hopkins University 备注:6 pages, 7 figures 链接:https://arxiv.org/abs/2107.04973 摘要:本文提出了第一种自适应调整语音信号持续时间的方法。我们的方法使用一个贝叶斯框架来定义一个潜在的注意图,该图连接输入和目标话语的框架。我们训练一个掩蔽卷积编解码网络,通过平均绝对误差损失函数的随机版本来产生这个注意图;我们的模型还使用编码器嵌入来预测目标语音信号的长度。预测长度确定解码器操作的步数。在推理过程中,我们生成注意图作为给定输入语音和未知目标语音信号之间相似度矩阵的代理。利用这个相似矩阵,我们计算了两个信号之间对齐的扭曲路径。我们的实验表明,这种自适应框架在语音转换和情感转换任务中产生了与依赖于已知目标信号的动态时间扭曲相似的结果。我们还表明,我们的技术在一个高质量的生成语音,这是等同于最先进的声码器。 摘要:We propose the first method to adaptively modify the duration of a given speech signal. Our approach uses a Bayesian framework to define a latent attention map that links frames of the input and target utterances. We train a masked convolutional encoder-decoder network to produce this attention map via a stochastic version of the mean absolute error loss function; our model also predicts the length of the target speech signal using the encoder embeddings. The predicted length determines the number of steps for the decoder operation. During inference, we generate the attention map as a proxy for the similarity matrix between the given input speech and an unknown target speech signal. Using this similarity matrix, we compute a warping path of alignment between the two signals. Our experiments demonstrate that this adaptive framework produces similar results to dynamic time warping, which relies on a known target signal, on both voice conversion and emotion conversion tasks. We also show that our technique results in a high quality of generated speech that is on par with state-of-the-art vocoders.

强化学习(4篇)

【1】 Behavior Constraining in Weight Space for Offline Reinforcement Learning 标题:离线强化学习中权空间中的行为约束

作者:Phillip Swazinna,Steffen Udluft,Daniel Hein,Thomas Runkler 机构:) Siemens Technology - Learning Systems - Germany, ) Technical University of Munich - Dept. of Informatics - Germany 备注:Accepted at ESANN 2021 链接:https://arxiv.org/abs/2107.05479 摘要:在离线强化学习中,需要从一个预先收集的数据集中学习策略。通常,通过基于生成策略和训练策略的动作分布之间的差异添加惩罚,策略因此在训练期间被正则化,以表现出与数据生成策略相似的行为。提出了一种新的算法,该算法将策略直接约束在权值空间中,并通过实验验证了算法的有效性。 摘要:In offline reinforcement learning, a policy needs to be learned from a single pre-collected dataset. Typically, policies are thus regularized during training to behave similarly to the data generating policy, by adding a penalty based on a divergence between action distributions of generating and trained policy. We propose a new algorithm, which constrains the policy directly in its weight space instead, and demonstrate its effectiveness in experiments.

【2】 CoBERL: Contrastive BERT for Reinforcement Learning 标题:CoBERL:强化学习的对比ERT

作者:Andrea Banino,Adrià Puidomenech Badia,Jacob Walker,Tim Scholtes,Jovana Mitrovic,Charles Blundell 机构:DeepMind, London, UK 备注:9 pages, 2 figures, 6 tables 链接:https://arxiv.org/abs/2107.05431 摘要:许多强化学习(RL)代理需要大量的经验来解决任务。我们提出了RL的对比BERT(CoBERL),它结合了一种新的对比损耗和一种混合LSTM转换器结构来解决提高数据效率的挑战。CoBERL支持从广泛领域的像素进行高效、健壮的学习。我们使用双向掩蔽预测结合最近的对比方法的推广学习更好的表示Transformer在RL,而不需要手工工程的数据扩充。我们发现,CoBERL在整个Atari套件、一组控制任务和一个具有挑战性的3D环境中不断提高性能。 摘要:Many reinforcement learning (RL) agents require a large amount of experience to solve tasks. We propose Contrastive BERT for RL (CoBERL), an agent that combines a new contrastive loss and a hybrid LSTM-transformer architecture to tackle the challenge of improving data efficiency. CoBERL enables efficient, robust learning from pixels across a wide range of domains. We use bidirectional masked prediction in combination with a generalization of recent contrastive methods to learn better representations for transformers in RL, without the need of hand engineered data augmentations. We find that CoBERL consistently improves performance across the full Atari suite, a set of control tasks and a challenging 3D environment.

【3】 A Simple Reward-free Approach to Constrained Reinforcement Learning 标题:一种简单的无报酬约束强化学习方法

作者:Sobhan Miryoosefi,Chi Jin 机构:Princeton University 链接:https://arxiv.org/abs/2107.05216 摘要:在约束强化学习(RL)中,学习主体不仅要优化总体奖励,还要满足额外的安全性、多样性或预算约束。因此,现有的约束RL解决方案需要几个新的算法成分,这些成分明显不同于标准RL。另一方面,无报酬RL是在无约束文献中独立发展起来的,它不需要报酬信息就可以学习过渡动力学,因此自然能够在公共动力学下处理多目标RL。本文将无报酬RL和约束RL联系起来。特别地,我们提出了一个简单的元算法,使得在给定任何无报酬RL-oracle的情况下,可以直接解决可接近性和约束RL问题,而样本复杂度的开销可以忽略不计。利用现有的无报酬RL解算器,我们的框架在表MDP环境下为受限RL提供了锐利的样本复杂度结果,将现有的最佳结果匹配到一个水平依赖因子;我们的框架直接扩展到一个表格式的两人Markov对策,并给出了一个线性函数逼近的约束RL的新结果。 摘要:In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints. Consequently, existing constrained RL solutions require several new algorithmic ingredients that are notably different from standard RL. On the other hand, reward-free RL is independently developed in the unconstrained literature, which learns the transition dynamics without using the reward information, and thus naturally capable of addressing RL with multiple objectives under the common dynamics. This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. Utilizing the existing reward-free RL solvers, our framework provides sharp sample complexity results for constrained RL in the tabular MDP setting, matching the best existing results up to a factor of horizon dependence; our framework directly extends to a setting of tabular two-player Markov games, and gives a new result for constrained RL with linear function approximation.

【4】 Polynomial Time Reinforcement Learning in Correlated FMDPs with Linear Value Functions 标题:线性值函数相关FMDP的多项式时间强化学习

作者:Siddartha Devic,Zihao Deng,Brendan Juba 机构:University of Southern California, Department of Computer Science & Engineering, Washington University in St. Louis 备注:30 pages, 1 figure 链接:https://arxiv.org/abs/2107.05187 摘要:在实践中,许多强化学习(RL)环境具有巨大的状态空间,这些状态空间可以用一个“因子化”结构来描述,而这个结构可以用因子化马尔可夫决策过程(FMDPs)来建模。我们提出了第一个多项式时间算法,该算法不依赖于oracle规划器,只需要一个具有适当局部基的线性值函数,而不需要线性过渡模型。在这个假设下,我们可以通过构造一个有效的凸优化分离预言机,在多项式时间内求解FMDPs。重要的是,与先前的工作相比,我们不假设各种因素的转变是独立的。 摘要:Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a "factored" structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL with FMDPs that does not rely on an oracle planner, and instead of requiring a linear transition model, only requires a linear value function with a suitable local basis with respect to the factorization. With this assumption, we can solve FMDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work, we do not assume that the transitions on various factors are independent.

元学习(1篇)

【1】 Meta-learning PINN loss functions 标题:元学习Pinn损失函数

作者:Apostolos F Psaros,Kenji Kawaguchi,George Em Karniadakis 机构:Division of Applied Mathematics, Brown University, Providence, RI , USA, Center of Mathematical Sciences and Applications, Harvard University, Cambridge, MA , USA 链接:https://arxiv.org/abs/2107.05544 摘要:提出了一种离线发现物理信息神经网络(PINN)损失函数的元学习方法。我们扩展了先前关于元学习的工作,并开发了一种基于梯度的元学习算法,用于解决基于参数化偏微分方程(pde)的任务分配问题。此外,基于新理论,我们确定了PINN问题中元学习损失的两个理想性质,我们通过提出一种新的正则化方法或使用损失函数的特定参数化来实现这两个性质。在计算实例中,元学习损失在测试时用于处理回归和偏微分方程任务分布。我们的结果表明,即使对于分布外元测试,使用任务间共享的离线学习损失函数也可以显著提高性能。在这种情况下,我们解决了不属于元训练中使用的任务分布的测试任务,并且我们还采用了与元训练中使用的PINN体系结构不同的PINN体系结构。为了更好地理解所提出的方法的能力和局限性,我们考虑损失函数的各种参数化,并描述不同的算法设计选项以及它们如何影响元学习性能。 摘要:We propose a meta-learning technique for offline discovery of physics-informed neural network (PINN) loss functions. We extend earlier works on meta-learning, and develop a gradient-based meta-learning algorithm for addressing diverse task distributions based on parametrized partial differential equations (PDEs) that are solved with PINNs. Furthermore, based on new theory we identify two desirable properties of meta-learned losses in PINN problems, which we enforce by proposing a new regularization method or using a specific parametrization of the loss function. In the computational examples, the meta-learned losses are employed at test time for addressing regression and PDE task distributions. Our results indicate that significant performance improvement can be achieved by using a shared-among-tasks offline-learned loss function even for out-of-distribution meta-testing. In this case, we solve for test tasks that do not belong to the task distribution used in meta-training, and we also employ PINN architectures that are different from the PINN architecture used in meta-training. To better understand the capabilities and limitations of the proposed method, we consider various parametrizations of the loss function and describe different algorithm design options and how they may affect meta-learning performance.

符号|符号学习(1篇)

【1】 MidiBERT-Piano: Large-scale Pre-training for Symbolic Music Understanding 标题:MidiBERT-Piano:符号音乐理解的大规模预训

作者:Yi-Hui Chou,I-Chun Chen,Chin-Jui Chang,Joann Ching,Yi-Hsuan Yang 机构: YH Chou is also affiliated with National Taiwan University; ICChen with National Tsing Hua university; and YH Yang with Taiwan AILabs (e-mail 链接:https://arxiv.org/abs/2107.05223 摘要:本文尝试利用BERT的mask语言建模方法对4166个复调钢琴MIDI文件进行12层变换器模型的预训练,以解决一些符号域辨别性的音乐理解任务。这包括两个音符级分类任务,即旋律提取和速度预测,以及两个序列级分类任务,即作曲家分类和情感分类。我们发现,给定一个预先训练好的Transformer,我们的模型优于基于递归神经网络的基线,微调时间少于10个周期。消融研究表明,即使在预训练阶段没有看到下游任务的MIDI数据,预训练仍然有效,并且在微调阶段冻结Transformer的自我注意层会略微降低性能。这项工作中使用的所有五个数据集都是公开的,以及我们预先训练和微调模型的检查点。因此,我们的研究可以作为符号领域音乐理解的基准。 摘要:This paper presents an attempt to employ the mask language modeling approach of BERT to pre-train a 12-layer Transformer model over 4,166 pieces of polyphonic piano MIDI files for tackling a number of symbolic-domain discriminative music understanding tasks. These include two note-level classification tasks, i.e., melody extraction and velocity prediction, as well as two sequence-level classification tasks, i.e., composer classification and emotion classification. We find that, given a pre-trained Transformer, our models outperform recurrent neural network based baselines with less than 10 epochs of fine-tuning. Ablation studies show that the pre-training remains effective even if none of the MIDI data of the downstream tasks are seen at the pre-training stage, and that freezing the self-attention layers of the Transformer at the fine-tuning stage slightly degrades performance. All the five datasets employed in this work are publicly available, as well as checkpoints of our pre-trained and fine-tuned models. As such, our research can be taken as a benchmark for symbolic-domain music understanding.

分层学习(1篇)

【1】 Personalized Federated Learning via Maximizing Correlation with Sparse and Hierarchical Extensions 标题:基于稀疏和分层扩展最大化相关性的个性化联合学习

作者:YinchuanLi,XiaofengLiu,XuZhang,YunfengShao,QingWang,YanhuiGeng 机构: 2Tianjin University 链接:https://arxiv.org/abs/2107.05330 摘要:联邦学习(FL)是一种协作式机器学习技术,可以在不获取客户私有数据的情况下训练全局模型。FL面临的主要挑战是客户端之间的统计差异、客户端设备之间的计算能力有限以及服务器和客户端之间的通信开销过大和延迟过长。针对这些问题,本文提出了一种基于最大相关度的个性化联邦学习算法(pFedMac),并将其扩展到稀疏模型和层次模型。通过最小化包含近似L1范数和层次相关特性的损失函数,提高了统计分集数据的性能,减少了网络所需的通信和计算负载。理论证明pFedMac的性能优于基于L2范数距离的个性化方法。通过实验,我们证明了这种稀疏层次个性化体系结构与最新的个性化方法及其扩展(例如,在异构和非i.i.d数据分布下,pFedMac在MNIST上的准确率为99.75%,在合成上的准确率为87.27%)相比的优势 摘要:Federated Learning (FL) is a collaborative machine learning technique to train a global model without obtaining clients' private data. The main challenges in FL are statistical diversity among clients, limited computing capability among client equipments and the excessive communication overhead and long latency between server and clients. To address these problems, we propose a novel personalized federated learning via maximizing correlation pFedMac), and further extend it to sparse and hierarchical models. By minimizing loss functions including the properties of an approximated L1-norm and the hierarchical correlation, the performance on statistical diversity data is improved and the communicational and computational loads required in the network are reduced. Theoretical proofs show that pFedMac performs better than the L2-norm distance based personalization methods. Experimentally, we demonstrate the benefits of this sparse hierarchical personalization architecture compared with the state-of-the-art personalization methods and their extensions (e.g. pFedMac achieves 99.75% accuracy on MNIST and 87.27% accuracy on Synthetic under heterogeneous and non-i.i.d data distributions)

医学相关(4篇)

【1】 Learned super resolution ultrasound for improved breast lesion characterization 标题:学习超分辨率超声以改进乳腺病变定性

作者:Or Bar-Shira,Ahuva Grubstein,Yael Rapson,Dror Suhami,Eli Atar,Keren Peri-Hanania,Ronnie Rosen,Yonina C. Eldar 机构: 1 1 Department of Computer Science and Applied Mathematics, Weizmann Institute ofScience, Tel Aviv University 备注:to be published in MICCAI 2021 proceedings 链接:https://arxiv.org/abs/2107.05270 摘要:乳腺癌是女性最常见的恶性肿瘤。乳腺微钙化、肿块等影像学表现以及超声检查肿块的形态特征是诊断肿瘤的主要指标。然而,需要提高这些成像方式的特异性。一个主要的替代靶点是新生血管生成。病理学上,它有助于许多类型的肿瘤的发展和转移的形成。因此,通过微血管的可视化显示新生血管可能是非常重要的。超分辨超声定位显微术可以在毛细血管水平成像微血管。然而,将超分辨率超声转化为临床需要解决的挑战包括重建时间长、依赖于系统点扩散函数(PSF)的先验知识以及超声造影剂(uca)的可分性。在这项工作中,我们使用了一个深层的神经网络架构,有效地利用信号结构来应对这些挑战。我们介绍了三种不同乳腺病变的活体人体结果。通过利用我们训练的网络,微血管结构可以在短时间内恢复,而无需事先了解PSF,也不需要uca的可分性。每种回收物都显示出与已知组织结构相对应的不同结构。这项研究证明了基于临床扫描仪的活体人体超分辨技术的可行性,以提高超声对不同乳腺病变的特异性,并促进超声在乳腺疾病诊断中的应用。 摘要:Breast cancer is the most common malignancy in women. Mammographic findings such as microcalcifications and masses, as well as morphologic features of masses in sonographic scans, are the main diagnostic targets for tumor detection. However, improved specificity of these imaging modalities is required. A leading alternative target is neoangiogenesis. When pathological, it contributes to the development of numerous types of tumors, and the formation of metastases. Hence, demonstrating neoangiogenesis by visualization of the microvasculature may be of great importance. Super resolution ultrasound localization microscopy enables imaging of the microvasculature at the capillary level. Yet, challenges such as long reconstruction time, dependency on prior knowledge of the system Point Spread Function (PSF), and separability of the Ultrasound Contrast Agents (UCAs), need to be addressed for translation of super-resolution US into the clinic. In this work we use a deep neural network architecture that makes effective use of signal structure to address these challenges. We present in vivo human results of three different breast lesions acquired with a clinical US scanner. By leveraging our trained network, the microvasculature structure is recovered in a short time, without prior PSF knowledge, and without requiring separability of the UCAs. Each of the recoveries exhibits a different structure that corresponds with the known histological structure. This study demonstrates the feasibility of in vivo human super resolution, based on a clinical scanner, to increase US specificity for different breast lesions and promotes the use of US in the diagnosis of breast pathologies.

【2】 One Map Does Not Fit All: Evaluating Saliency Map Explanation on Multi-Modal Medical Images 标题:一张地图不能包罗万象:评价多模态医学图像的显著性地图解释

作者:Weina Jin,Xiaoxiao Li,Ghassan Hamarneh 机构: 1School of Computing Science, Simon Fraser University, Canada 2Department of Electrical and Computer Engineering, The University of British Columbia 链接:https://arxiv.org/abs/2107.05047 摘要:能够向临床最终用户解释预测是利用人工智能模型为临床决策提供支持的必要条件。对于医学图像,显著性图是最常见的解释形式。地图突出了人工智能模型预测的重要特征。尽管已经提出了许多显著图方法,但是它们在解释多模态医学图像决策方面的效果如何还不得而知,因为每个模态/通道都具有相同基本生物医学现象的不同临床意义。理解这些模态依赖的特征对于临床用户解释人工智能决策是至关重要的。为了解决这一临床上重要但技术上被忽视的问题,我们提出了MSFI(模态特定特征重要性)度量来检验显著性图是否能够突出模态特定的重要特征。MSFI编码的临床需求,对模式的优先顺序和模式特定的功能定位。我们对16种常用的显著图方法(包括一项临床医生用户研究)的评估表明,尽管大多数显著图方法总体上捕获了模态重要性信息,但大多数显著图方法未能始终如一、准确地突出模态特定的重要特征。评价结果指导了显著图方法的选择,为临床应用提出了新的显著图方法。 摘要:Being able to explain the prediction to clinical end-users is a necessity to leverage the power of AI models for clinical decision support. For medical images, saliency maps are the most common form of explanation. The maps highlight important features for AI model's prediction. Although many saliency map methods have been proposed, it is unknown how well they perform on explaining decisions on multi-modal medical images, where each modality/channel carries distinct clinical meanings of the same underlying biomedical phenomenon. Understanding such modality-dependent features is essential for clinical users' interpretation of AI decisions. To tackle this clinically important but technically ignored problem, we propose the MSFI (Modality-Specific Feature Importance) metric to examine whether saliency maps can highlight modality-specific important features. MSFI encodes the clinical requirements on modality prioritization and modality-specific feature localization. Our evaluations on 16 commonly used saliency map methods, including a clinician user study, show that although most saliency map methods captured modality importance information in general, most of them failed to highlight modality-specific important features consistently and precisely. The evaluation results guide the choices of saliency map methods and provide insights to propose new ones targeting clinical applications.

【3】 Anatomy of Domain Shift Impact on U-Net Layers in MRI Segmentation 标题:磁共振图像分割中域移位对U-Net层影响的解剖学研究

作者:Ivan Zakazov,Boris Shirokikh,Alexey Chernyavskiy,Mikhail Belyaev 机构:Belyaev, Philips Research, Moscow, Russia, Skolkovo Institute of Science and Technology, Moscow, Russia 备注:Accepted for MICCAI-2021 conference 链接:https://arxiv.org/abs/2107.04914 摘要:领域自适应(DA)方法广泛应用于医学图像分割任务中,以解决不同分布的训练(源)和测试(目标)数据的问题。我们考虑监督的DA任务与有限数量的注释样本从目标域。它对应于最相关的临床设置之一:在尽可能少的注释数据量上建立一个足够精确的模型。现有的方法大多是对预训练卷积神经网络(CNN)的特定层进行微调。然而,对于哪一层更好地进行微调还没有共识,例如,对于具有低水平域偏移的图像,第一层更好,对于具有高水平域偏移的图像,更深的层更好。为此,我们提出SpotTUnet-一个CNN架构,它可以自动选择需要优化微调的层。更具体地说,在目标域上,我们的方法还学习指示特定层是应该微调还是应该从预先训练的网络中重用的策略。我们表明,即使在注释数据极度匮乏的情况下,我们的方法的性能也与最好的非柔性微调方法相当。其次,我们证明了SpotTUnet策略提供了一个分层的可视化域转移对网络的影响,这可以进一步用于开发健壮的域综合方法。为了广泛评估SpotTUnet的性能,我们使用了一个公开的脑MR图像数据集(CC359),其特征是明确的域转移。我们发布了一个可复制的实验管道。 摘要:Domain Adaptation (DA) methods are widely used in medical image segmentation tasks to tackle the problem of differently distributed train (source) and test (target) data. We consider the supervised DA task with a limited number of annotated samples from the target domain. It corresponds to one of the most relevant clinical setups: building a sufficiently accurate model on the minimum possible amount of annotated data. Existing methods mostly fine-tune specific layers of the pretrained Convolutional Neural Network (CNN). However, there is no consensus on which layers are better to fine-tune, e.g. the first layers for images with low-level domain shift or the deeper layers for images with high-level domain shift. To this end, we propose SpotTUnet - a CNN architecture that automatically chooses the layers which should be optimally fine-tuned. More specifically, on the target domain, our method additionally learns the policy that indicates whether a specific layer should be fine-tuned or reused from the pretrained network. We show that our method performs at the same level as the best of the nonflexible fine-tuning methods even under the extreme scarcity of annotated data. Secondly, we show that SpotTUnet policy provides a layer-wise visualization of the domain shift impact on the network, which could be further used to develop robust domain generalization methods. In order to extensively evaluate SpotTUnet performance, we use a publicly available dataset of brain MR images (CC359), characterized by explicit domain shift. We release a reproducible experimental pipeline.

【4】 DebiasedDTA: Model Debiasing to Boost Drug -- Target Affinity Prediction 标题:无偏DTA:提高药物亲和力的模型去偏--靶亲和力预测

作者:Rıza Özçelik,Alperen Bağ,Berk Atıl,Arzucan Özgür,Elif Özkırımlı 机构:Department of Computer Engineering, Bo˘gazi¸ci University, ˙Istanbul, Turkey and, Data and Analytics Chapter, Pharma International Informatics, F. Hoffmann-La Roche AG, Switzerland, +These authors contributed equally to the work. 链接:https://arxiv.org/abs/2107.05556 摘要:动机:精确识别高亲和力蛋白质-化合物对的计算模型可以加速药物发现过程。这些模型的目的是通过药物-靶点相互作用数据集学习结合机制,并利用所学知识预测任何蛋白质-化合物对的亲和力。然而,他们所依赖的数据集具有误导性的模式,使模型偏向于记忆数据集特定的生物分子特性,而不是学习绑定机制。由于对结合机制的关注不够,所得到的模型在预测药物-靶点亲和力(DTA)时存在困难,特别是在从头生物分子之间。在这里,我们提出了DebiasedDTA,第一个模型debiasing方法,它避免了数据集偏差,以提高对新生物分子的亲和力预测。DebiasedDTA采用集成学习和权值样本自适应方法进行偏差识别和避免,适用于几乎所有的DTA预测模型。结果:DebiasedDTA在预测新的生物分子间的相互作用时,可以提高模型的预测精度。已知的生物分子也能从性能提升中获益,不过,随着测试生物分子与训练集的差异越来越大,性能提升也会随之放大。实验还表明,DebiasedDTA可以扩充不同输入和模型结构的DTA预测模型,避免不同来源的偏差。可用性:源代码、模型和数据集在https://github.com/boun-tabi/debiaseddta-reproduce 联系人:arzucan。ozgur@boun.edu.tr,埃利夫。ozkirimli@roche.com 摘要:Motivation: Computational models that accurately identify high-affinity protein-compound pairs can accelerate drug discovery pipelines. These models aim to learn binding mechanics through drug-target interaction datasets and use the learned knowledge while predicting the affinity of any protein-compound pair. However, the datasets they rely on bear misleading patterns that bias models towards memorizing dataset-specific biomolecule properties, instead of learning binding mechanics. Insufficiently focused on the binding mechanics, the resulting models struggle while predicting the drug-target affinities (DTA), especially between de novo biomolecules. Here we present DebiasedDTA, the first model debiasing approach that avoids dataset biases in order to boost the affinity prediction on novel biomolecules. DebiasedDTA uses ensemble learning and weight sample adaptation for bias identification and avoidance and is applicable to almost all existing DTA prediction models. Results: The results show that DebiasedDTA can boost models while predicting the interactions between novel biomolecules. Known biomolecules also benefit from the performance boost, though the boost is amplified as the test biomolecules become more dissimilar to the training set. The experiments also show that DebiasedDTA can augment the DTA prediction models of different input and model structures and can avoid biases of different sources. Availability: The source code, the models, and the data sets are available at https://github.com/boun-tabi/debiaseddta-reproduce Contact: arzucan.ozgur@boun.edu.tr, elif.ozkirimli@roche.com

推荐(1篇)

【1】 Sliding Spectrum Decomposition for Diversified Recommendation 标题:用于多样化推荐的滑动谱分解

作者:Yanhua Huang,Weikun Wang,Lei Zhang,Ruiwen Xu 机构:Xiaohongshu Inc., Shanghai, China 备注:In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD '21), August 14--18, 2021, Virtual Event, Singapore 链接:https://arxiv.org/abs/2107.05204 摘要:内容提要(contentfeed)是一种向用户推荐一系列内容以供浏览和使用的产品,在社交媒体平台上获得了巨大的人气。在本文中,我们建议使用时间序列分析技术,从项目序列的角度来研究这种情况下的多样性问题。我们推导了一种称为滑动谱分解(SSD)的方法,该方法可以捕捉用户在浏览长项目序列时对多样性的感知。我们也分享了在长尾效应下设计和实现一个合适的项目嵌入方法来进行精确的相似性度量的经验。结合在一起,它们现在已经在小红树应用的产品推荐系统中全面实现和部署,该系统每天为数千万用户提供主要的Explore Feed产品。通过理论分析、离线实验和在线A/B测试,验证了该方法的有效性和有效性。 摘要:Content feed, a type of product that recommends a sequence of items for users to browse and engage with, has gained tremendous popularity among social media platforms. In this paper, we propose to study the diversity problem in such a scenario from an item sequence perspective using time series analysis techniques. We derive a method called sliding spectrum decomposition (SSD) that captures users' perception of diversity in browsing a long item sequence. We also share our experiences in designing and implementing a suitable item embedding method for accurate similarity measurement under long tail effect. Combined together, they are now fully implemented and deployed in Xiaohongshu App's production recommender system that serves the main Explore Feed product for tens of millions of users every day. We demonstrate the effectiveness and efficiency of the method through theoretical analysis, offline experiments and online A/B tests.

聚类(1篇)

【1】 Cluster Regularization via a Hierarchical Feature Regression 标题:基于分层特征回归的聚类正则化

作者:Johann Pfitzinger 机构:Goethe University, Frankfurt am Main 链接:https://arxiv.org/abs/2107.04831 摘要:具有高维非正交预测集的预测任务对基于最小二乘法的拟合过程提出了挑战。一个庞大而富有成效的文献存在,讨论各种正则化方法,以提高样本外鲁棒性的参数估计。本文提出了一种新的基于聚类的正则化方法-层次特征回归(HFR),它利用机器学习和图论的知识,沿着预测集的有监督的层次表示来估计参数,将参数向群体目标收缩。该方法的创新之处在于它能够估计预测组的最优组成,以及组内目标。HFR可以看作是一个有监督的因子回归,收缩强度由拟合过程中捕捉到的异质性变化程度的惩罚决定。该方法具有很好的预测精度和通用性,在一系列不同的模拟回归任务(包括密集、稀疏和分组数据生成过程)中优于一组基准正则化估计。一个经济增长预测的应用被用来说明HFR的有效性在一个经验设置,并与几种常用的和贝叶斯的选择进行了有利的比较。 摘要:Prediction tasks with high-dimensional nonorthogonal predictor sets pose a challenge for least squares based fitting procedures. A large and productive literature exists, discussing various regularized approaches to improving the out-of-sample robustness of parameter estimates. This paper proposes a novel cluster-based regularization - the hierarchical feature regression (HFR) -, which mobilizes insights from the domains of machine learning and graph theory to estimate parameters along a supervised hierarchical representation of the predictor set, shrinking parameters towards group targets. The method is innovative in its ability to estimate optimal compositions of predictor groups, as well as the group targets endogenously. The HFR can be viewed as a supervised factor regression, with the strength of shrinkage governed by a penalty on the extent of idiosyncratic variation captured in the fitting process. The method demonstrates good predictive accuracy and versatility, outperforming a panel of benchmark regularized estimators across a diverse set of simulated regression tasks, including dense, sparse and grouped data generating processes. An application to the prediction of economic growth is used to illustrate the HFR's effectiveness in an empirical setting, with favorable comparisons to several frequentist and Bayesian alternatives.

超分辨率|去噪|去模糊|去雾(2篇)

【1】 Details Preserving Deep Collaborative Filtering-Based Method for Image Denoising 标题:基于细节保持深度协同滤波的图像去噪方法

作者:Basit O. Alawode,Mudassir Masood,Tarig Ballal,Tareq Al-Naffouri 机构:King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia., King Abdullah University of Science and Technology, Thuwal, Saudi Arabia. 链接:https://arxiv.org/abs/2107.05115 摘要:尽管多年来各种去噪算法都取得了一定的进步,但许多算法在去噪后仍不能很好地保留图像的细节。这是因为它们对图像具有平滑效果。大多数基于神经网络的去噪算法都取得了比经典去噪算法更好的量化性能。但是,由于平滑效果,它们也会受到质量(视觉)性能的影响。在本文中,我们提出了一个算法来解决这个缺点。提出了一种基于深度协同滤波(deep-CoFiB)的图像去噪算法。该算法利用一组优化的神经网络模型对稀疏域的图像块进行协同去噪。这就产生了一种快速算法,能够很好地在噪声去除和细节保持之间取得平衡。大量实验表明,DeepCoFiB在数量上(就PSNR和SSIM而言)和质量上(视觉上)都优于许多最先进的去噪算法。 摘要:In spite of the improvements achieved by the several denoising algorithms over the years, many of them still fail at preserving the fine details of the image after denoising. This is as a result of the smooth-out effect they have on the images. Most neural network-based algorithms have achieved better quantitative performance than the classical denoising algorithms. However, they also suffer from qualitative (visual) performance as a result of the smooth-out effect. In this paper, we propose an algorithm to address this shortcoming. We propose a deep collaborative filtering-based (Deep-CoFiB) algorithm for image denoising. This algorithm performs collaborative denoising of image patches in the sparse domain using a set of optimized neural network models. This results in a fast algorithm that is able to excellently obtain a trade-off between noise removal and details preservation. Extensive experiments show that the DeepCoFiB performed quantitatively (in terms of PSNR and SSIM) and qualitatively (visually) better than many of the state-of-the-art denoising algorithms.

【2】 Dense-Sparse Deep CNN Training for Image Denoising 标题:用于图像去噪的稠密-稀疏深度CNN训练

作者:Basit O. Alawode,Mudassir Masood,Tarig Ballal,Tareq Al-Naffouri 机构:Electrical Engineering Department, Computer, Electrical, and Mathematical Sciences and Engineering, King Fahd University of Petroleum and Minerals, Dhahran, Saudi Arabia, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia 链接:https://arxiv.org/abs/2107.04857 摘要:近年来,卷积神经网络等深度学习方法在图像去噪领域得到了广泛的应用。这是因为它们被证明有能力超越最先进的经典图像去噪算法,如BM3D。深度去噪CNNs(DnCNNs)采用多个前向卷积层,加入了批量归一化和残差学习的正则化方法,显著提高了去噪性能。然而,这是以大量可训练参数为代价的。在本文中,我们通过减少参数的数量来解决这个问题,同时达到相当的性能水平。我们从使用密集-稀疏-密集(DSD)训练方法的训练网络获得的改进性能中获得动机。我们将这种训练方法扩展到一个简化的DnCNN(rdncn)网络,使得去噪速度更快,参数显著减少,性能与DnCNN相当。 摘要:Recently, deep learning (DL) methods such as convolutional neural networks (CNNs) have gained prominence in the area of image denoising. This is owing to their proven ability to surpass state-of-the-art classical image denoising algorithms such as BM3D. Deep denoising CNNs (DnCNNs) use many feedforward convolution layers with added regularization methods of batch normalization and residual learning to improve denoising performance significantly. However, this comes at the expense of a huge number of trainable parameters. In this paper, we address this issue by reducing the number of parameters while achieving a comparable level of performance. We derive motivation from the improved performance obtained by training networks using the dense-sparse-dense (DSD) training approach. We extend this training approach to a reduced DnCNN (RDnCNN) network resulting in a faster denoising network with significantly reduced parameters and comparable performance to the DnCNN.

自动驾驶|车辆|车道检测等(1篇)

【1】 QoS Prediction for 5G Connected and Automated Driving 标题:5G联网自动驾驶的QoS预测

作者:Apostolos Kousaridas,Ramya Panthangi Manjunath,Jose Mauricio Perdomo,Chan Zhou,Ernst Zielinski,Steffen Schmitz,Andreas Pfadler 机构: Schmitz are with Volkswagen Infotainment GmbH Universitaetsstr 备注:7 pages, 5 figures, accepted for publication in the IEEE Communications Magazine 链接:https://arxiv.org/abs/2107.05000 摘要:5G通信系统可以支持许多先进的车辆对一切(V2X)用例的苛刻服务质量(QoS)要求。然而,所提供的服务质量的突然变化可能会影响安全和高效的驾驶,特别是自动化车辆的驾驶。因此,5G通信系统最近启用了QoS变化的预测和对车辆的这些预测变化的早期通知。此解决方案使车辆能够避免或减轻应用程序级别的突然QoS变化的影响。本文描述了5G通信系统如何生成QoS预测,并将其传递给V2X应用程序。以遥操作驾驶用例为例,分析了QoS预测方案的可行性。为QoS预测解决方案的开发提供了有用的建议,同时确定了开放的研究主题。 摘要:5G communication system can support the demanding quality-of-service (QoS) requirements of many advanced vehicle-to-everything (V2X) use cases. However, the safe and efficient driving, especially of automated vehicles, may be affected by sudden changes of the provided QoS. For that reason, the prediction of the QoS changes and the early notification of these predicted changes to the vehicles have been recently enabled by 5G communication systems. This solution enables the vehicles to avoid or mitigate the effect of sudden QoS changes at the application level. This article describes how QoS prediction could be generated by a 5G communication system and delivered to a V2X application. The tele-operated driving use case is used as an example to analyze the feasibility of a QoS prediction scheme. Useful recommendations for the development of a QoS prediction solution are provided, while open research topics are identified.

推理|分析|理解|解释(6篇)

【1】 Interpretable Mammographic Image Classification using Cased-Based Reasoning and Deep Learning 标题:基于案例推理和深度学习的可解释乳腺X线图像分类

作者:Alina Jade Barnett,Fides Regina Schwartz,Chaofan Tao,Chaofan Chen,Yinhao Ren,Joseph Y. Lo,Cynthia Rudin 机构:Department of Computer Science, Duke University, USA, Department of Radiology, Duke University, USA, School of Computing and Information Science, University of Maine, USA, Department of Biomedical Engineering, Duke University, USA 备注:10 pages, 6 figures, accepted for oral presentation at the IJCAI-21 Workshop on Deep Learning, Case-Based Reasoning, and AutoML: Present and Future Synergies. arXiv admin note: substantial text overlap with arXiv:2103.12308 链接:https://arxiv.org/abs/2107.05605 摘要:当我们在高风险的医学环境中部署机器学习模型时,我们必须确保这些模型做出与已知医学科学一致的准确预测。固有的可解释网络通过解释每个决策背后的基本原理来满足这一需求,同时保持与黑盒模型相同或更高的精度。在这项工作中,我们提出了一个新的解释性神经网络算法,使用基于案例的推理乳腺摄影。为了帮助放射科医生做出决定,我们的网络提供了恶性肿瘤的预测和使用已知医学特征对预测的解释。为了得到有用的解释,该网络被设计成模仿放射科医生的推理过程:我们的网络首先通过将每个新图像与从训练图像中学习的一组原型图像部分进行比较来检测每个图像的临床相关语义特征,然后利用这些临床特征预测恶性肿瘤。与其他方法相比,我们的模型能够以相同或更高的精度检测临床特征(肿块边缘),对其预测提供更详细的解释,并且能够更好地区分图像中与分类相关的部分。 摘要:When we deploy machine learning models in high-stakes medical settings, we must ensure these models make accurate predictions that are consistent with known medical science. Inherently interpretable networks address this need by explaining the rationale behind each decision while maintaining equal or higher accuracy compared to black-box models. In this work, we present a novel interpretable neural network algorithm that uses case-based reasoning for mammography. Designed to aid a radiologist in their decisions, our network presents both a prediction of malignancy and an explanation of that prediction using known medical features. In order to yield helpful explanations, the network is designed to mimic the reasoning processes of a radiologist: our network first detects the clinically relevant semantic features of each image by comparing each new image with a learned set of prototypical image parts from the training images, then uses those clinical features to predict malignancy. Compared to other methods, our model detects clinical features (mass margins) with equal or higher accuracy, provides a more detailed explanation of its prediction, and is better able to differentiate the classification-relevant parts of the image.

【2】 End-to-End Natural Language Understanding Pipeline for Bangla Conversational Agent 标题:面向孟加拉会话Agent的端到端自然语言理解流水线

作者:Fahim Shahriar Khan,Mueeze Al Mushabbir,Mohammad Sabik Irbaz,MD Abdullah Al Nasim 机构:Department of Computer Science and Engineering, Islamic University of Technology, Machine Learning Team, Pioneer Alpha Ltd., A PREPRINT 备注:Under Review 链接:https://arxiv.org/abs/2107.05541 摘要:聊天机器人是一种智能软件,用来代替人与人之间的互动。然而,现有的研究通常不能为像孟加拉语这样的低资源语言提供足够的支持。此外,由于社交媒体的日益普及,我们还可以看到母语为孟加拉语的人之间用孟加拉语音译(主要是英语)进行交流的增多。在本文中,我们提出了一种新的方法来建立一个孟加拉聊天机器人,目的是作为一个商务助理,可以用孟加拉语和孟加拉语音译在英语中进行沟通,具有很高的可信度。由于带注释的数据不可用于此目的,我们必须使用Rasa开源框架、fastText嵌入、Polyglot嵌入、Flask和其他系统作为构建块来处理整个机器学习生命周期(数据准备、机器学习建模和模型部署)。在处理倾斜的带注释的数据集时,我们尝试不同的设置和管道来评估哪种方法最有效,并为观察到的结果提供可能的推理。最后,我们提出了一个用于意图分类和实体提取的管道,达到了合理的性能(准确率:83.02\%,准确率:80.82\%,召回率:83.02\%,F1评分:80\%)。 摘要:Chatbots are intelligent software built to be used as a replacement for human interaction. However, existing studies typically do not provide enough support for low-resource languages like Bangla. Moreover, due to the increasing popularity of social media, we can also see the rise of interactions in Bangla transliteration (mostly in English) among the native Bangla speakers. In this paper, we propose a novel approach to build a Bangla chatbot aimed to be used as a business assistant which can communicate in Bangla and Bangla Transliteration in English with high confidence consistently. Since annotated data was not available for this purpose, we had to work on the whole machine learning life cycle (data preparation, machine learning modeling, and model deployment) using Rasa Open Source Framework, fastText embeddings, Polyglot embeddings, Flask, and other systems as building blocks. While working with the skewed annotated dataset, we try out different setups and pipelines to evaluate which works best and provide possible reasoning behind the observed results. Finally, we present a pipeline for intent classification and entity extraction which achieves reasonable performance (accuracy: 83.02\%, precision: 80.82\%, recall: 83.02\%, F1-score: 80\%).

【3】 Longitudinal Correlation Analysis for Decoding Multi-Modal Brain Development 标题:解码多模态脑发育的纵向相关分析

作者:Qingyu Zhao,Ehsan Adeli,Kilian M. Pohl 机构: School of Medicine, Stanford University, Stanford, USA, Computer Science Department, Stanford University, Stanford, USA, Center of Health Sciences, SRI International, Menlo Park, USA 链接:https://arxiv.org/abs/2107.04724 摘要:从童年开始,人类的大脑在整个生命中重组和重新布线。描述这种复杂的大脑发育需要对纵向和多模态神经成像数据进行有效分析。在这里,我们提出这样一种分析方法命名为纵向相关分析(LCA)。LCA首先将每个模态的输入减少为基于自动编码器的潜在表示,从而将两个模态的数据耦合起来。然后,自我监督策略通过联合分离两个方向(每个空间中一个方向)将两个潜在空间联系起来,使得沿着这些方向的潜在表征的纵向变化在模态之间最大程度地相关。应用生命周期分析法(LCA)对全国青少年酒精与神经发育联合会679名青少年的纵向T1加权和弥散加权磁共振成像进行了分析。不同于现有的侧重于横截面或单模态建模的方法,LCA成功地从从数据中提取的形态和扩散特征中揭示了耦合的宏观结构和微观结构的大脑发育。对这些受试者的原始三维图像进行LCA测试,成功地复制了基于特征的分析结果。最后,生命周期评价揭示的发育效应符合目前对青少年大脑成熟模式的理解。 摘要:Starting from childhood, the human brain restructures and rewires throughout life. Characterizing such complex brain development requires effective analysis of longitudinal and multi-modal neuroimaging data. Here, we propose such an analysis approach named Longitudinal Correlation Analysis (LCA). LCA couples the data of two modalities by first reducing the input from each modality to a latent representation based on autoencoders. A self-supervised strategy then relates the two latent spaces by jointly disentangling two directions, one in each space, such that the longitudinal changes in latent representations along those directions are maximally correlated between modalities. We applied LCA to analyze the longitudinal T1-weighted and diffusion-weighted MRIs of 679 youths from the National Consortium on Alcohol and Neurodevelopment in Adolescence. Unlike existing approaches that focus on either cross-sectional or single-modal modeling, LCA successfully unraveled coupled macrostructural and microstructural brain development from morphological and diffusivity features extracted from the data. A retesting of LCA on raw 3D image volumes of those subjects successfully replicated the findings from the feature-based analysis. Lastly, the developmental effects revealed by LCA were inline with the current understanding of maturational patterns of the adolescent brain.

【4】 A Topological-Framework to Improve Analysis of Machine Learning Model Performance 标题:一种改进机器学习模型性能分析的拓扑框架

作者:Henry Kvinge,Colby Wight,Sarah Akers,Scott Howland,Woongjo Choi,Xiaolong Ma,Luke Gosink,Elizabeth Jurrus,Keerti Kappagantula,Tegan H. Emerson 机构: USA 2Department of Mathematics, University of Washington, USA 4Department of Mathematics 备注:6 pages 链接:https://arxiv.org/abs/2107.04714 摘要:随着机器学习模型和用于评估它们的数据集的规模和复杂性的增长,使用一些摘要统计数据来理解模型性能的实践变得越来越困难。在真实场景中尤其如此,在这些场景中,了解特定数据子群体的模型故障至关重要。在本文中,我们提出了一个评估机器学习模型的拓扑框架,其中数据集被视为一个模型操作的“空间”。这为我们提供了一种原则性的方法来组织全局级别(在整个测试集上)和局部级别(在特定子群体上)的模型性能信息。最后,我们描述了一个拓扑数据结构presheaves,它提供了一种方便的方法来存储和分析不同子种群之间的模型性能。 摘要:As both machine learning models and the datasets on which they are evaluated have grown in size and complexity, the practice of using a few summary statistics to understand model performance has become increasingly problematic. This is particularly true in real-world scenarios where understanding model failure on certain subpopulations of the data is of critical importance. In this paper we propose a topological framework for evaluating machine learning models in which a dataset is treated as a "space" on which a model operates. This provides us with a principled way to organize information about model performance at both the global level (over the entire test set) and also the local level (on specific subpopulations). Finally, we describe a topological data structure, presheaves, which offer a convenient way to store and analyze model performance between different subpopulations.

【5】 Hölder Bounds for Sensitivity Analysis in Causal Reasoning 标题:因果推理中灵敏度分析的Hölder界

作者:Serge Assaad,Shuxi Zeng,Henry Pfister,Fan Li,Lawrence Carin 机构: and make as- 1Department of Electrical & Computer Engineering, DukeUniversity 2Department of Statistical Science, Duke University 备注:Workshop on the Neglected Assumptions in Causal Inference at the International Conference on Machine Learning (ICML), 2021 链接:https://arxiv.org/abs/2107.04661 摘要:考虑到存在未观察到的混杂因素U,我们检验了治疗T对结果Y影响的区间估计。利用H“older不等式,我们根据不可测的混杂程度(即U->T的强度和U->Y的强度),导出了一组关于混杂偏倚E[Y | T=T]-E[Y | do(T=T)]的界。当U独立于T或当U独立于Y给定T时(当没有未观察到的混淆时),这些界限是紧的。我们关注这个界的一个特例,它取决于分布p(U)和p(U | T=T)之间的总变化距离,以及条件期望结果E[Y | U=U,T=T]与平均期望结果E[Y | T=T]的最大偏差(U的所有可能值)。我们讨论了可能的校准策略,以获得治疗效果的区间估计,并使用合成和半合成数据集对界限进行了实验验证。 摘要:We examine interval estimation of the effect of a treatment T on an outcome Y given the existence of an unobserved confounder U. Using H\"older's inequality, we derive a set of bounds on the confounding bias |E[Y|T=t]-E[Y|do(T=t)]| based on the degree of unmeasured confounding (i.e., the strength of the connection U->T, and the strength of U->Y). These bounds are tight either when U is independent of T or when U is independent of Y given T (when there is no unobserved confounding). We focus on a special case of this bound depending on the total variation distance between the distributions p(U) and p(U|T=t), as well as the maximum (over all possible values of U) deviation of the conditional expected outcome E[Y|U=u,T=t] from the average expected outcome E[Y|T=t]. We discuss possible calibration strategies for this bound to get interval estimates for treatment effects, and experimentally validate the bound using synthetic and semi-synthetic datasets.

【6】 Convergence Analysis of Schr{ö}dinger-F{ö}llmer Sampler without Convexity 标题:无凸性Schr{ö}dinger-F{ö}llmer采样器的收敛性分析

作者:Yuling Jiao,Lican Kang,Yanyan Liu,Youzhou Zhou 机构:cn†School of Mathematics and Statistics, Wuhan University, cn‡School of Mathematics and Statistics 备注:arXiv admin note: text overlap with arXiv:2106.10880 链接:https://arxiv.org/abs/2107.04766 摘要:Schr{o}dinger-F{o}llmer采样器(SFS)是一种新的、有效的无遍历性的非正规分布采样方法。SFS是基于薛定谔扩散过程$$\mathrm{d}X{t}=-\nabla U\ left(X\U t,t\ right)\mathrm{d}t+\mathrm{d}B{t}、\quad t\in[0,1]、\quad X{U 0=0$$在单位区间上的Euler-Maruyama离散化,它将时间零点的简并分布传递到时间零点的目标分布。在{sfs21}中,SFS的一致性是在一个有限的假设下建立的,即漂移项$b(x,t)$势$U(x,t)$是一致的(在$t$)强%凹凸的(在$x$)。本文在一些光滑有界的条件下,给出了Wasserstein距离上SFS的一个非交感误差界,即目标分布的密度比超过标准正态分布,但不要求势的强凸性。 摘要:Schr\"{o}dinger-F\"{o}llmer sampler (SFS) is a novel and efficient approach for sampling from possibly unnormalized distributions without ergodicity. SFS is based on the Euler-Maruyama discretization of Schr\"{o}dinger-F\"{o}llmer diffusion process $$\mathrm{d} X_{t}=-\nabla U\left(X_t, t\right) \mathrm{d} t+\mathrm{d} B_{t}, \quad t \in[0,1],\quad X_0=0$$ on the unit interval, which transports the degenerate distribution at time zero to the target distribution at time one. In \cite{sfs21}, the consistency of SFS is established under a restricted assumption that %the drift term $b(x,t)$ the potential $U(x,t)$ is uniformly (on $t$) strongly %concave convex (on $x$). In this paper we provide a nonasymptotic error bound of SFS in Wasserstein distance under some smooth and bounded conditions on the density ratio of the target distribution over the standard normal distribution, but without requiring the strongly convexity of the potential.

检测相关(6篇)

【1】 LATTE: LSTM Self-Attention based Anomaly Detection in Embedded Automotive Platforms 标题:Latte:嵌入式汽车平台中基于LSTM自我注意的异常检测

作者:Vipin K. Kukkala,Sooryaa V. Thiruloga,Sudeep Pasricha 机构: Colorado State University SOORYAA VIGNESH THIRULOGA, Colorado State University SUDEEP PASRICHA 链接:https://arxiv.org/abs/2107.05561 摘要:现代汽车可以被认为是一个复杂的分布式嵌入式系统,运行各种具有实时约束的汽车应用程序。汽车工业在实现更大自主性方面的最新进展使车辆越来越多地与各种外部系统(如路边信标、其他车辆)连接,这使得新兴车辆极易受到网络攻击。此外,汽车应用和车内网络的复杂性增加导致攻击可见性差,这使得检测此类攻击在汽车系统中尤其具有挑战性。在这项工作中,我们提出了一种新的异常检测框架,称为LATTE,用于检测汽车平台中基于控制器局域网(CAN)的网络中的网络攻击。我们提出的LATTE框架使用堆叠式长-短期记忆(LSTM)预测网络和新颖的注意机制来学习设计时的正常操作行为。随后,使用一种新的检测方案(也在设计时训练)在运行时检测各种网络攻击(作为异常)。我们在不同的汽车攻击场景下评估了我们提出的LATTE框架,并与该领域最著名的先前工作进行了详细的比较,以证明我们方法的潜力。 摘要:Modern vehicles can be thought of as complex distributed embedded systems that run a variety of automotive applications with real-time constraints. Recent advances in the automotive industry towards greater autonomy are driving vehicles to be increasingly connected with various external systems (e.g., roadside beacons, other vehicles), which makes emerging vehicles highly vulnerable to cyber-attacks. Additionally, the increased complexity of automotive applications and the in-vehicle networks results in poor attack visibility, which makes detecting such attacks particularly challenging in automotive systems. In this work, we present a novel anomaly detection framework called LATTE to detect cyber-attacks in Controller Area Network (CAN) based networks within automotive platforms. Our proposed LATTE framework uses a stacked Long Short Term Memory (LSTM) predictor network with novel attention mechanisms to learn the normal operating behavior at design time. Subsequently, a novel detection scheme (also trained at design time) is used to detect various cyber-attacks (as anomalies) at runtime. We evaluate our proposed LATTE framework under different automotive attack scenarios and present a detailed comparison with the best-known prior works in this area, to demonstrate the potential of our approach.

【2】 WVOQ at SemEval-2021 Task 6: BART for Span Detection and Classification 标题:SemEval上的WVOQ-2021年任务6:跨度检测和分类的BART

作者:Cees Roele 备注:None 链接:https://arxiv.org/abs/2107.05467 摘要:提出了一种新的跨域检测与分类方法,该方法利用BART编码器模型将文本输入转换成具有类XML标记跨域的文本。该标记随后被转换为片段及其类的开始和结束的标识。讨论了预训练方法如何解释这种方法的相对成功及其局限性。本文报告了参加SemEval-2021任务6:文本和图像中说服技巧的检测。 摘要:A novel solution to span detection and classification is presented in which a BART EncoderDecoder model is used to transform textual input into a version with XML-like marked up spans. This markup is subsequently translated to an identification of the beginning and end of fragments and of their classes. Discussed is how pre-training methodology both explains the relative success of this method and its limitations. This paper reports on participation in task 6 of SemEval-2021: Detection of Persuasion Techniques in Texts and Images.

【3】 Nearest neighbour approaches for Emotion Detection in Tweets 标题:推文情感检测的最近邻方法

作者:Olha Kaminska,Chris Cornelis,Veronique Hoste 机构:Computational Web Intelligence, Department of Applied Mathematics, Computer Science and Statistics, Ghent University, LT, Language and Translation Technology Team 备注:The paper was presented at EACL 2021 during the WASSA workshop as a poster and published at ACL Anthology 链接:https://arxiv.org/abs/2107.05394 摘要:情感检测是一项重要的任务,它可以应用于社会媒体数据中发现新的知识。虽然使用深度学习方法来完成这项任务已经很普遍了,但是它们是黑盒模型,使得它们的决策很难被人类操作者理解。因此,在本文中,我们提出了一种使用加权k$近邻(kNN)的方法,这是一种简单、易于实现和解释的机器学习模型。这些性质有助于提高结果的可靠性,指导误差分析。特别是,我们将加权kNN模型应用于SemEval-2018的tweets中的共享情感检测任务。使用不同的文本嵌入方法和情感词汇得分来表示tweets,并通过加权kNN模型集成来进行分类。我们的最佳方法取得了与最先进的解决方案相竞争的结果,并为神经网络方法开辟了一条有希望的替代途径。 摘要:Emotion detection is an important task that can be applied to social media data to discover new knowledge. While the use of deep learning methods for this task has been prevalent, they are black-box models, making their decisions hard to interpret for a human operator. Therefore, in this paper, we propose an approach using weighted $k$ Nearest Neighbours (kNN), a simple, easy to implement, and explainable machine learning model. These qualities can help to enhance results' reliability and guide error analysis. In particular, we apply the weighted kNN model to the shared emotion detection task in tweets from SemEval-2018. Tweets are represented using different text embedding methods and emotion lexicon vocabulary scores, and classification is done by an ensemble of weighted kNN models. Our best approaches obtain results competitive with state-of-the-art solutions and open up a promising alternative path to neural network methods.

【4】 Fuzzy-Rough Nearest Neighbour Approaches for Emotion Detection in Tweets 标题:模糊-粗糙最近邻法在推文情感检测中的应用

作者:Olha Kaminska,Chris Cornelis,Veronique Hoste 机构:Computational Web Intelligence, Department of Applied Mathematics, Computer Science and Statistics, Ghent University, Ghent, Belgium, LT, Language and Translation Technology Team 备注:The paper submitted to the IJCRS 2021 conference, organized jointly with IFSA-EUSFLAT 2021 链接:https://arxiv.org/abs/2107.05392 摘要:社交媒体是有意义数据的重要来源,可用于情绪分析和情绪识别等不同任务。大多数情况下,这些任务是通过深度学习方法来解决的。由于文本数据的模糊性,我们考虑使用基于模糊粗糙集的分类方法。具体来说,我们提出了一种基于有序加权平均(OWA)算子增强的模糊粗糙近邻(FRNN)分类器的SemEval-2018情感检测方法。我们使用基于不同文本嵌入方法的FRNN-OWA模型的调谐集成。我们的结果与基于更复杂的深度学习方法的最佳SemEval解决方案具有竞争力。 摘要:Social media are an essential source of meaningful data that can be used in different tasks such as sentiment analysis and emotion recognition. Mostly, these tasks are solved with deep learning methods. Due to the fuzzy nature of textual data, we consider using classification methods based on fuzzy rough sets. Specifically, we develop an approach for the SemEval-2018 emotion detection task, based on the fuzzy rough nearest neighbour (FRNN) classifier enhanced with ordered weighted average (OWA) operators. We use tuned ensembles of FRNN--OWA models based on different text embedding methods. Our results are competitive with the best SemEval solutions based on more complicated deep learning methods.

【5】 Out-of-Distribution Dynamics Detection: RL-Relevant Benchmarks and Results 标题:分布外动态检测:与RL相关的基准和结果

作者:Mohamad H Danesh,Alan Fern 机构: a trained 1School of Electrical Engineering and Computer Science, Ore-gon State University 备注:ICML 2021 Workshop on Uncertainty and Robustness in Deep Learning 链接:https://arxiv.org/abs/2107.04982 摘要:我们研究了分布外动态(OODD)检测问题,该问题涉及到与训练分布动态相比,时间过程的动态何时发生变化的检测。这与控制、强化学习(RL)和多变量时间序列的应用有关,其中测试时间动态的变化会以未知的方式影响学习控制器/预测器的性能。这个问题在深度RL环境下尤其重要,在深度RL环境下,学习的控制器常常过度适应训练环境。然而,目前对于RL研究中常用的环境类型,还缺乏已建立的OODD基准。我们的第一个贡献是设计一组OODD基准,这些基准源于具有不同OODD类型和强度的通用RL环境。我们的第二个贡献是设计了一种基于递归隐式分位数网络(RIQNs)的强OODD基线方法,用于监控OODD检测的自回归预测误差。我们的最终贡献是评估基准上的RIQN方法,为将来的比较提供基线结果。 摘要:We study the problem of out-of-distribution dynamics (OODD) detection, which involves detecting when the dynamics of a temporal process change compared to the training-distribution dynamics. This is relevant to applications in control, reinforcement learning (RL), and multi-variate time-series, where changes to test time dynamics can impact the performance of learning controllers/predictors in unknown ways. This problem is particularly important in the context of deep RL, where learned controllers often overfit to the training environment. Currently, however, there is a lack of established OODD benchmarks for the types of environments commonly used in RL research. Our first contribution is to design a set of OODD benchmarks derived from common RL environments with varying types and intensities of OODD. Our second contribution is to design a strong OODD baseline approach based on recurrent implicit quantile networks (RIQNs), which monitors autoregressive prediction errors for OODD detection. Our final contribution is to evaluate the RIQN approach on the benchmarks to provide baseline results for future comparison.

【6】 U-Net with Hierarchical Bottleneck Attention for Landmark Detection in Fundus Images of the Degenerated Retina 标题:具有分层瓶颈关注度的U网在变性视网膜眼底图像中的界标检测

作者:Shuyun Tang,Ziming Qi,Jacob Granley,Michael Beyeler 机构:University of California, Santa Barbara, CA 链接:https://arxiv.org/abs/2107.04721 摘要:在临床实践中,眼底摄影通常用于记录视网膜退行性疾病的存在和严重程度,如年龄相关性黄斑变性(AMD)、青光眼和糖尿病视网膜病变(DR),其中中心凹和视盘(OD)是重要的视网膜标志。然而,在视网膜变性过程中,病变、drusen和其他视网膜异常的发生严重影响了标志物的自动检测和分割。在这里,我们提出了HBA-U-Net:一个具有层次化瓶颈关注的U-Net主干。该网络由一个新的瓶颈注意块组成,它结合并细化自我注意、通道注意和相对位置注意,以突出视网膜异常,这些异常可能对退化视网膜的中心凹和OD分割很重要。HBA-U-Net在数据集和眼部条件下(ADAM:Euclidean Distance(ED)为25.4像素,aveed:32.5像素,IDRiD:32.1像素)、AMD的OD分割(ADAM:Dice Coefficient(DC)为0.947)和DR的OD检测(IDRiD:ED为20.5像素)取得了最先进的结果。我们的结果表明,HBA-U-Net可能非常适合于各种视网膜退行性疾病的标志性检测。 摘要:Fundus photography has routinely been used to document the presence and severity of retinal degenerative diseases such as age-related macular degeneration (AMD), glaucoma, and diabetic retinopathy (DR) in clinical practice, for which the fovea and optic disc (OD) are important retinal landmarks. However, the occurrence of lesions, drusen, and other retinal abnormalities during retinal degeneration severely complicates automatic landmark detection and segmentation. Here we propose HBA-U-Net: a U-Net backbone enriched with hierarchical bottleneck attention. The network consists of a novel bottleneck attention block that combines and refines self-attention, channel attention, and relative-position attention to highlight retinal abnormalities that may be important for fovea and OD segmentation in the degenerated retina. HBA-U-Net achieved state-of-the-art results on fovea detection across datasets and eye conditions (ADAM: Euclidean Distance (ED) of 25.4 pixels, REFUGE: 32.5 pixels, IDRiD: 32.1 pixels), on OD segmentation for AMD (ADAM: Dice Coefficient (DC) of 0.947), and on OD detection for DR (IDRiD: ED of 20.5 pixels). Our results suggest that HBA-U-Net may be well suited for landmark detection in the presence of a variety of retinal degenerative diseases.

分类|识别(4篇)

【1】 Fine-Grained AutoAugmentation for Multi-label Classification 标题:用于多标签分类的细粒度自动增强算法

作者:Ya Wang,Hesen Chen,Fangyi Zhang,Yaohua Wang,Xiuyu Sun,Ming Lin,Hao Li 机构:Alibaba Group 链接:https://arxiv.org/abs/2107.05384 摘要:数据扩充是提高深度学习模型泛化能力的常用方法。最近的研究表明,学习的数据扩充策略比手工构建的数据扩充策略具有更好的泛化能力。然而,这些工作大多对数据集中的所有样本使用统一的增广策略,这对多标签分类任务中的所有标签都不一定有利,即有些策略可能对某些标签产生负面影响,而对另一些标签有利。为了解决这个问题,我们提出了一种新的基于标签的自动增广(LB-Aug)方法,该方法通过增广策略网络针对标签生成增广策略。利用策略梯度方法,通过强化学习学习策略,提供从实例标签到最优扩充策略的映射。数值实验表明,在图像和视频分类的多个基准测试中,我们的LB-Aug方法比现有的增强方法有较大的优势。 摘要:Data augmentation is a commonly used approach to improving the generalization of deep learning models. Recent works show that learned data augmentation policies can achieve better generalization than hand-crafted ones. However, most of these works use unified augmentation policies for all samples in a dataset, which is observed not necessarily beneficial for all labels in multi-label classification tasks, i.e., some policies may have negative impacts on some labels while benefitting the others. To tackle this problem, we propose a novel Label-Based AutoAugmentation (LB-Aug) method for multi-label scenarios, where augmentation policies are generated with respect to labels by an augmentation-policy network. The policies are learned via reinforcement learning using policy gradient methods, providing a mapping from instance labels to their optimal augmentation policies. Numerical experiments show that our LB-Aug outperforms previous state-of-the-art augmentation methods by large margins in multiple benchmarks on image and video classification.

【2】 Spectro-Temporal RF Identification using Deep Learning 标题:基于深度学习的频谱-时间射频识别

作者:Hai N. Nguyen,Marinos Vomvas,Triet Vo-Huu,Guevara Noubir 机构:Cybersecurity and Privacy Institute, Northeastern University 链接:https://arxiv.org/abs/2107.05114 摘要:射频发射检测、分类和光谱时间定位不仅对于理解、管理和保护射频频谱的任务至关重要,而且对于检测入侵无人机或干扰机等安全和安保应用也至关重要。实现宽带频谱和实时性能的这一目标是一个具有挑战性的问题。本文介绍了一种宽带实时射频识别系统crist的结构和系统。我们由此产生的深度学习模型能够使用100 MHz频谱的实时射频样本(超过6Gbps的输入I&Q流)在时间和频率上检测、分类和精确定位射频发射。通过利用基于深度学习的单阶段目标检测框架,并将学习转换为基于多通道图像的RF信号表示,使得这种能力变得可行。我们还介绍了一种迭代训练方法,该方法利用合成和增强的射频数据有效地建立射频发射(扩散)的大型标记数据集。即使在极度拥挤的野外环境中,腕部探测器也能达到90%的平均精度。腕部模型分为五种技术(蓝牙、光桥、Wi-Fi、XPD和ZigBee),很容易扩展到其他技术。我们正在向整个社区提供我们策划和注释的数据集。它由近100万个完全标记的射频发射组成,这些射频发射是在各种环境下从各种现成的无线无线电中收集的,跨越五类发射。 摘要:RF emissions detection, classification, and spectro-temporal localization are crucial not only for tasks relating to understanding, managing, and protecting the RF spectrum, but also for safety and security applications such as detecting intruding drones or jammers. Achieving this goal for wideband spectrum and in real-time performance is a challenging problem. We present WRIST, a Wideband, Real-time RF Identification system with Spectro-Temporal detection, framework and system. Our resulting deep learning model is capable to detect, classify, and precisely locate RF emissions in time and frequency using RF samples of 100 MHz spectrum in real-time (over 6Gbps incoming I&Q streams). Such capabilities are made feasible by leveraging a deep-learning based one-stage object detection framework, and transfer learning to a multi-channel image-based RF signals representation. We also introduce an iterative training approach which leverages synthesized and augmented RF data to efficiently build large labelled datasets of RF emissions (SPREAD). WRIST detector achieves 90 mean Average Precision even in extremely congested environment in the wild. WRIST model classifies five technologies (Bluetooth, Lightbridge, Wi-Fi, XPD, and ZigBee) and is easily extendable to others. We are making our curated and annotated dataset available to the whole community. It consists of nearly 1 million fully labelled RF emissions collected from various off-the-shelf wireless radios in a range of environments and spanning the five classes of emissions.

【3】 Positive-Unlabeled Classification under Class-Prior Shift: A Prior-invariant Approach Based on Density Ratio Estimation 标题:类先验偏移下的正无标记分类:一种基于密度比估计的先验不变量方法

作者:Shota Nakajima,Masashi Sugiyama 机构:The University of Tokyo, RIKEN 备注:18 pages, 4 figures 链接:https://arxiv.org/abs/2107.05045 摘要:从阳性和未标记(PU)数据中学习是各种应用中的一个重要问题。目前大多数PU分类方法都假设训练未标记数据集中的类先验(阳性样本的比率)与测试数据中的类先验(阳性样本的比率)相同,这在许多实际情况下是不成立的。另外,我们通常不知道训练数据和测试数据的类先验,因此没有它们我们就不知道如何训练分类器。针对这些问题,提出了一种基于密度比估计的PU分类方法。该方法的一个显著优点是在训练阶段不需要先验知识;班前班只在测试阶段进行。我们从理论上证明了我们提出的方法,并通过实验证明了它的有效性。 摘要:Learning from positive and unlabeled (PU) data is an important problem in various applications. Most of the recent approaches for PU classification assume that the class-prior (the ratio of positive samples) in the training unlabeled dataset is identical to that of the test data, which does not hold in many practical cases. In addition, we usually do not know the class-priors of the training and test data, thus we have no clue on how to train a classifier without them. To address these problems, we propose a novel PU classification method based on density ratio estimation. A notable advantage of our proposed method is that it does not require the class-priors in the training phase; class-prior shift is incorporated only in the test phase. We theoretically justify our proposed method and experimentally demonstrate its effectiveness.

【4】 Effect of Input Size on the Classification of Lung Nodules Using Convolutional Neural Networks 标题:输入大小对卷积神经网络用于肺结节分类的影响

作者:Gorkem Polat,Yesim Dogrusoz Serinagaoglu,Ugur Halici 机构: ULAKBIM UASL - MIDDLE EAST TECHNICAL UNIVERSITY 备注:4 pages, in Turkish language, 2018 26th Signal Processing and Communications Applications Conference (SIU) 链接:https://arxiv.org/abs/2107.05085 摘要:最近的研究表明,与传统的胸片检查相比,每年使用低剂量计算机断层扫描(CT)进行肺癌筛查可降低肺癌死亡率20%。因此,CT肺筛查已开始在世界范围内广泛应用。然而,分析这些图像对放射科医生来说是一个严重的负担。一次CT扫描的切片数可达600个,因此,计算机辅助检测(CAD)系统对于更快、更准确地评估数据非常重要。在这项研究中,我们提出了一个框架,分析CT肺筛查使用卷积神经网络(CNNs)以减少假阳性。我们用不同的体积大小训练了我们的模型,并且证明了体积大小对系统的性能起着关键的作用。我们还使用了不同的融合,以显示他们的力量和对整体精度的影响。三维CNN比二维CNN更受欢迎,因为应用于三维数据的二维卷积运算可能导致信息丢失。该框架已在LUNA16挑战提供的数据集上进行了测试,在每次扫描1个假阳性时,灵敏度为0.831。 摘要:Recent studies have shown that lung cancer screening using annual low-dose computed tomography (CT) reduces lung cancer mortality by 20% compared to traditional chest radiography. Therefore, CT lung screening has started to be used widely all across the world. However, analyzing these images is a serious burden for radiologists. The number of slices in a CT scan can be up to 600. Therefore, computer-aided-detection (CAD) systems are very important for faster and more accurate assessment of the data. In this study, we proposed a framework that analyzes CT lung screenings using convolutional neural networks (CNNs) to reduce false positives. We trained our model with different volume sizes and showed that volume size plays a critical role in the performance of the system. We also used different fusions in order to show their power and effect on the overall accuracy. 3D CNNs were preferred over 2D CNNs because 2D convolutional operations applied to 3D data could result in information loss. The proposed framework has been tested on the dataset provided by the LUNA16 Challenge and resulted in a sensitivity of 0.831 at 1 false positive per scan.

表征(2篇)

【1】 A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution 标题:一种面向高级自然语言指令执行的持久空间语义表示

作者:Valts Blukis,Chris Paxton,Dieter Fox,Animesh Garg,Yoav Artzi 机构:NVIDIA, Cornell University, University of Washington, University of Toronto, Vector Institute 备注:Submitted to CoRL 2021 链接:https://arxiv.org/abs/2107.05612 摘要:自然语言为机器人代理提供了一个可访问的、可表达的接口来指定长期任务。然而,非专家可能会用高级指令来指定这些任务,这些指令通过几个抽象层抽象特定的机器人动作。我们认为,在长时间的执行范围内,语言和机器人动作之间弥合这一鸿沟的关键是持久性表示。我们提出了一种持久的空间语义表示方法,并展示了如何构建一个执行分层推理的代理来有效地执行长期任务。我们在ALFRED基准上评估我们的方法,并获得最先进的结果,尽管完全避免了常用的分步说明。 摘要:Natural language provides an accessible and expressive interface to specify long-term tasks for robotic agents. However, non-experts are likely to specify such tasks with high-level instructions, which abstract over specific robot actions through several layers of abstraction. We propose that key to bridging this gap between language and robot actions over long execution horizons are persistent representations. We propose a persistent spatial semantic representation method, and show how it enables building an agent that performs hierarchical reasoning to effectively execute long-term tasks. We evaluate our approach on the ALFRED benchmark and achieve state-of-the-art results, despite completely avoiding the commonly used step-by-step instructions.

【2】 InfoVAEGAN : learning joint interpretable representations by information maximization and maximum likelihood 标题:InfoVAEGAN:基于信息最大化和最大似然的联合可解释表示学习

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted at International Conference on Image Processing (ICIP 2021) 链接:https://arxiv.org/abs/2107.04705 摘要:学习分离的和可解释的表示是在流形上实现综合数据表示的一个重要步骤。本文提出了一种新的表示学习算法,该算法结合了变分自编码器(VAE)的推理能力和生成对抗网络(GAN)的泛化能力。该模型称为InfoVAEGAN,由三个网络组成:编码器、发生器和鉴别器。InfoVAEGAN的目标是通过使用两个不同的无数据对数似然函数对从生成器分布中采样的变量,以无监督的方式联合学习离散和连续的可解释表示。我们提出了一个两阶段的算法,分别优化推理网络和生成器训练。此外,我们通过最大化现有潜变量与通过生成和推理过程产生的潜变量之间的互信息来加强可解释表示的学习。 摘要:Learning disentangled and interpretable representations is an important step towards accomplishing comprehensive data representations on the manifold. In this paper, we propose a novel representation learning algorithm which combines the inference abilities of Variational Autoencoders (VAE) with the generalization capability of Generative Adversarial Networks (GAN). The proposed model, called InfoVAEGAN, consists of three networks~: Encoder, Generator and Discriminator. InfoVAEGAN aims to jointly learn discrete and continuous interpretable representations in an unsupervised manner by using two different data-free log-likelihood functions onto the variables sampled from the generator's distribution. We propose a two-stage algorithm for optimizing the inference network separately from the generator training. Moreover, we enforce the learning of interpretable representations through the maximization of the mutual information between the existing latent variables and those created through generative and inference processes.

优化|敛散性(3篇)

【1】 Differentially Private Stochastic Optimization: New Results in Convex and Non-Convex Settings 标题:微分私有随机优化:凸和非凸设置下的新结果

作者:Cristóbal Guzmán,Raef Bassily,Michael Menart 机构: The Ohio State University, edu†Department of Applied Mathematics, University of Twente and Institute for Mathematical and Computational Engineering, Pon-tificia Universidad Católica de Chile c 链接:https://arxiv.org/abs/2107.05585 摘要:研究了凸和非凸环境下的微分私有随机优化问题。对于凸情形,我们重点研究了非光滑广义线性损失(GLLs)族。我们的$\ellu 2$集的算法在近似线性时间内实现了最优的超额总体风险,而最著名的一般凸损失的微分私有算法在超线性时间内运行。我们的$\ellu 1$设置的算法具有接近最优的超额总体风险$\tilde{O}\big(\sqrt{\frac{\log{d}}{n}}}\big)$,并且避开了一般非光滑凸损失的[AFKT21]的维数依赖下界。在微分私有非凸环境下,我们提出了几种新的算法来逼近群体风险的平稳点。对于具有光滑损失和多面体约束的$\ellu1$-情形,我们给出了线性时间内的第一个近似维数无关率$\tilde O\big(\frac{\log^{2/3}{d}}{n^{1/3}}}\big)$。对于有约束的$\ellu 2$情形,在光滑损失下,我们得到了一个速率为$\tilde O\big(\frac{1}{n^{3/10}d^{1/10}}+\big(\frac{d}{n^2}\big)^{1/5}\big)$的线性时间算法。最后,对于$\ell\u2$情形,我们提供了第一种方法,用于{\em非光滑弱凸}随机优化,其速率为$\tilde O\big(\frac{1}{n^{1/4}}+\big(\frac{d}{n^2}\big)^{1/6}\big)$,当$d=O(\sqrt{n})$时,它与现有的最佳非私有算法相匹配。我们还将上述非凸$\ellu 2$设置的所有结果推广到$\ellu p$设置,其中$1摘要:We study differentially private stochastic optimization in convex and non-convex settings. For the convex case, we focus on the family of non-smooth generalized linear losses (GLLs). Our algorithm for the $\ell_2$ setting achieves optimal excess population risk in near-linear time, while the best known differentially private algorithms for general convex losses run in super-linear time. Our algorithm for the $\ell_1$ setting has nearly-optimal excess population risk $\tilde{O}\big(\sqrt{\frac{\log{d}}{n}}\big)$, and circumvents the dimension dependent lower bound of [AFKT21] for general non-smooth convex losses. In the differentially private non-convex setting, we provide several new algorithms for approximating stationary points of the population risk. For the $\ell_1$-case with smooth losses and polyhedral constraint, we provide the first nearly dimension independent rate, $\tilde O\big(\frac{\log^{2/3}{d}}{{n^{1/3}}}\big)$ in linear time. For the constrained $\ell_2$-case, with smooth losses, we obtain a linear-time algorithm with rate $\tilde O\big(\frac{1}{n^{3/10}d^{1/10}}+\big(\frac{d}{n^2}\big)^{1/5}\big)$. Finally, for the $\ell_2$-case we provide the first method for {\em non-smooth weakly convex} stochastic optimization with rate $\tilde O\big(\frac{1}{n^{1/4}}+\big(\frac{d}{n^2}\big)^{1/6}\big)$ which matches the best existing non-private algorithm when $d= O(\sqrt{n})$. We also extend all our results above for the non-convex $\ell_2$ setting to the $\ell_p$ setting, where $1 < p \leq 2$, with only polylogarithmic (in the dimension) overhead in the rates.

【2】 Dual Optimization for Kolmogorov Model Learning Using Enhanced Gradient Descent 标题:基于增强型梯度下降的Kolmogorov模型学习的对偶优化

作者:Qiyou Duan,Hadi Ghauch,Taejoon Kim 机构: Ghauch is with the Department of COMELEC 备注:Submitted to IEEE Transactions on Signal Processing 链接:https://arxiv.org/abs/2107.05011 摘要:数据表示技术为推进数据处理和机器学习做出了重大贡献。提高预测能力是以前表示技术的重点,不幸的是,在提取数据的潜在见解方面,这些技术在可解释性方面表现得相当差。最近,Kolmogorov模型(KM)被研究,它是一种可解释和可预测的表示方法,用于学习一组随机变量的潜在概率结构。然而,现有的基于随机半定松弛(SDRwR)或离散单调优化(DMO)的知识管理学习算法在大数据应用中的实用性受到限制,因为它们在计算上不能很好地扩展。本文提出了一种基于正则化对偶优化和增强梯度下降(GD)方法的计算可伸缩KM学习算法。为了使我们的方法更具可扩展性,我们提出了两种加速方案,即特征值分解(EVD)消去策略和近似EVD算法。此外,利用近似误差分析和归一化Minkowski$\ellu 1$-范数及其界,提供了一种阈值技术来选择近似EVD算法的迭代次数。将该方法应用于大数据应用中,结果表明,该方法可以在显著降低计算复杂度的前提下,获得兼容的训练/预测性能;与现有的KM学习算法相比,在时间开销方面大约有两个数量级的改进。此外,本文提出的知识管理学习算法对可解释性的逻辑关系挖掘的准确率超过了$80\%$。 摘要:Data representation techniques have made a substantial contribution to advancing data processing and machine learning (ML). Improving predictive power was the focus of previous representation techniques, which unfortunately perform rather poorly on the interpretability in terms of extracting underlying insights of the data. Recently, Kolmogorov model (KM) was studied, which is an interpretable and predictable representation approach to learning the underlying probabilistic structure of a set of random variables. The existing KM learning algorithms using semi-definite relaxation with randomization (SDRwR) or discrete monotonic optimization (DMO) have, however, limited utility to big data applications because they do not scale well computationally. In this paper, we propose a computationally scalable KM learning algorithm, based on the regularized dual optimization combined with enhanced gradient descent (GD) method. To make our method more scalable to large-dimensional problems, we propose two acceleration schemes, namely, eigenvalue decomposition (EVD) elimination strategy and proximal EVD algorithm. Furthermore, a thresholding technique by exploiting the approximation error analysis and leveraging the normalized Minkowski $\ell_1$-norm and its bounds, is provided for the selection of the number of iterations of the proximal EVD algorithm. When applied to big data applications, it is demonstrated that the proposed method can achieve compatible training/prediction performance with significantly reduced computational complexity; roughly two orders of magnitude improvement in terms of the time overhead, compared to the existing KM learning algorithms. Furthermore, it is shown that the accuracy of logical relation mining for interpretability by using the proposed KM learning algorithm exceeds $80\%$.

【3】 L2M: Practical posterior Laplace approximation with optimization-driven second moment estimation 标题:L2M:优化驱动二阶矩估计的实用后验拉普拉斯逼近

作者:Christian S. Perone,Roberto Pereira Silveira,Thomas Paula 备注:6 pages, 1 figure, accepted for ICML 2021 UDL Workshop 链接:https://arxiv.org/abs/2107.04695 摘要:深度神经网络的不确定性量化是近年来发展起来的。在这项工作中,我们重温拉普拉斯近似,一个经典的后验近似方法,是计算吸引力。然而,我们不需要计算曲率矩阵,而是证明了在某些正则性条件下,利用梯度二阶矩可以很容易地构造拉普拉斯近似。这个数量已经由Adagrad的许多指数移动平均变量(如Adam和RMSprop)估计,但是传统上在训练之后被丢弃。我们证明了我们的方法(L2M)不需要改变模型或优化,只需几行代码就可以得到合理的结果,并且除了优化器已经计算的内容外,它不需要任何额外的计算步骤,而不引入任何新的超参数。我们希望我们的方法能为深层神经网络中利用优化器已经计算出的量进行不确定性估计开辟新的研究方向。 摘要:Uncertainty quantification for deep neural networks has recently evolved through many techniques. In this work, we revisit Laplace approximation, a classical approach for posterior approximation that is computationally attractive. However, instead of computing the curvature matrix, we show that, under some regularity conditions, the Laplace approximation can be easily constructed using the gradient second moment. This quantity is already estimated by many exponential moving average variants of Adagrad such as Adam and RMSprop, but is traditionally discarded after training. We show that our method (L2M) does not require changes in models or optimization, can be implemented in a few lines of code to yield reasonable results, and it does not require any extra computational steps besides what is already being computed by optimizers, without introducing any new hyperparameter. We hope our method can open new research directions on using quantities already computed by optimizers for uncertainty estimation in deep neural networks.

预测|估计(10篇)

【1】 Nonlinear Least Squares for Large-Scale Machine Learning using Stochastic Jacobian Estimates 标题:基于随机雅可比估计的非线性最小二乘大规模机器学习

作者:Johannes J. Brust 机构: Note that in ( 1) both 1Department of Mathematics, University of California SanDiego 备注:None 链接:https://arxiv.org/abs/2107.05598 摘要:对于机器学习中的大型非线性最小二乘损失函数,我们利用了模型参数的数量通常超过一批数据的特性。这意味着在损失的Hessian中有一个低秩结构,这使得能够有效地计算搜索方向。利用这一性质,我们开发了两种估计雅可比矩阵的算法,并与现有的方法进行了比较。 摘要:For large nonlinear least squares loss functions in machine learning we exploit the property that the number of model parameters typically exceeds the data in one batch. This implies a low-rank structure in the Hessian of the loss, which enables effective means to compute search directions. Using this property, we develop two algorithms that estimate Jacobian matrices and perform well when compared to state-of-the-art methods.

【2】 Comparing seven methods for state-of-health time series prediction for the lithium-ion battery packs of forklifts 标题:叉车锂离子电池组健康时间序列预测的七种方法比较

作者:Matti Huotari,Shashank Arora,Avleen Malhi,Kary Främling 机构:Department of Computer Science, Aalto University, Espoo, Finland., Department of Mechanical Engineering, Aalto University, Espoo, Finland., Department of Computer Science, Umeå University, Umeå, Sweden. 备注:None 链接:https://arxiv.org/abs/2107.05489 摘要:叉车的一个关键方面是健康状态(SoH)评估,以确保不间断电源的安全性和可靠性。为了实现预防性维护,从而降低成本,预测电池的SoH是非常必要的。本文论证了在电池先验信息较少的情况下,梯度增强回归预测SoH时间序列的能力。比较了梯度增强法与光梯度增强法、额外树法、极端梯度增强法、随机森林法、长短记忆网络法以及卷积神经网络与长短记忆网络相结合的方法。我们使用多个预测器和滞后的目标信号分解结果作为额外的预测器,并将得到的预测结果与每种方法的不同预测器集进行比较。对于这项工作,我们拥有一个独特的数据集45锂离子电池组与大变化的数据。我们导出的最佳模型通过一种新的前向算法进行了验证,该算法还计算了预测的逐点置信区间;我们得出了合理的预测和预测的置信区间。此外,我们还用其他五种锂离子电池组验证了该模型;最好的模式推广到更大程度上这套电池组。关于最终模型的结果表明,相对于先前开发的模型,我们能够增强结果。此外,我们进一步验证了模型提取周期计数在我们以前的工作提出的数据从新的叉车;他们的电池组在10年的使用期内完成了约3000次循环,相当于商用镍钴锰(NMC)电池的循环寿命。 摘要:A key aspect for the forklifts is the state-of-health (SoH) assessment to ensure the safety and the reliability of uninterrupted power source. Forecasting the battery SoH well is imperative to enable preventive maintenance and hence to reduce the costs. This paper demonstrates the capabilities of gradient boosting regression for predicting the SoH timeseries under circumstances when there is little prior information available about the batteries. We compared the gradient boosting method with light gradient boosting, extra trees, extreme gradient boosting, random forests, long short-term memory networks and with combined convolutional neural network and long short-term memory networks methods. We used multiple predictors and lagged target signal decomposition results as additional predictors and compared the yielded prediction results with different sets of predictors for each method. For this work, we are in possession of a unique data set of 45 lithium-ion battery packs with large variation in the data. The best model that we derived was validated by a novel walk-forward algorithm that also calculates point-wise confidence intervals for the predictions; we yielded reasonable predictions and confidence intervals for the predictions. Furthermore, we verified this model against five other lithium-ion battery packs; the best model generalised to greater extent to this set of battery packs. The results about the final model suggest that we were able to enhance the results in respect to previously developed models. Moreover, we further validated the model for extracting cycle counts presented in our previous work with data from new forklifts; their battery packs completed around 3000 cycles in a 10-year service period, which corresponds to the cycle life for commercial Nickel-Cobalt-Manganese (NMC) cells.

【3】 AutoFB: Automating Fetal Biometry Estimation from Standard Ultrasound Planes 标题:AutoFB:从标准超声平面自动估计胎儿生物特征

作者:Sophia Bano,Brian Dromey,Francisco Vasconcelos,Raffaele Napolitano,Anna L. David,Donald M. Peebles,Danail Stoyanov 机构: WellcomeEPSRC Centre for Interventional and Surgical Sciences(WEISS), Department of Computer Science, University College London, London, UK, Elizabeth Garrett Anderson Institute for Women’s Health, University College 备注:Accepted at MICCAI 2021 链接:https://arxiv.org/abs/2107.05255 摘要:在怀孕期间,孕中期的超声检查可以根据标准化图表评估胎儿大小。为了实现可重复和准确的测量,超声医生需要识别胎儿解剖结构的三个标准二维平面(头部、腹部、股骨),并手动标记图像上的关键解剖标志,以便进行准确的生物测量和胎儿体重估计。这可能是一个耗时的操作员依赖的任务,特别是对一个受训的超声医生。计算机辅助技术有助于胎儿生物测量计算过程的自动化。在本文中,我们提出了一个统一的自动化框架估计所有测量所需的胎儿体重评估。该框架使用最先进的分割模型对关键的胎儿解剖结构进行语义分割,然后进行区域拟合和尺度恢复以进行生物测量估计。通过对42例妊娠的349张超声标准平面图像进行4次交叉验证,研究了分割算法的鲁棒性。此外,我们还证明了分割性能最好的网络更适合于生物测量估计。此外,我们证明了临床测量和预测胎儿生物测量之间的误差低于常规临床测量的允许误差。 摘要:During pregnancy, ultrasound examination in the second trimester can assess fetal size according to standardized charts. To achieve a reproducible and accurate measurement, a sonographer needs to identify three standard 2D planes of the fetal anatomy (head, abdomen, femur) and manually mark the key anatomical landmarks on the image for accurate biometry and fetal weight estimation. This can be a time-consuming operator-dependent task, especially for a trainee sonographer. Computer-assisted techniques can help in automating the fetal biometry computation process. In this paper, we present a unified automated framework for estimating all measurements needed for the fetal weight assessment. The proposed framework semantically segments the key fetal anatomies using state-of-the-art segmentation models, followed by region fitting and scale recovery for the biometry estimation. We present an ablation study of segmentation algorithms to show their robustness through 4-fold cross-validation on a dataset of 349 ultrasound standard plane images from 42 pregnancies. Moreover, we show that the network with the best segmentation performance tends to be more accurate for biometry estimation. Furthermore, we demonstrate that the error between clinically measured and predicted fetal biometry is lower than the permissible error during routine clinical measurements.

【4】 Predicting sepsis in multi-site, multi-national intensive care cohorts using deep learning 标题:利用深度学习预测多地点、多国家重症监护队列中的脓毒症

作者:Michael Moor,Nicolas Bennet,Drago Plecko,Max Horn,Bastian Rieck,Nicolai Meinshausen,Peter Bühlmann,Karsten Borgwardt 机构: Department of Biosystems Science and Engineering, SIB Swiss Institute of Bioinformatics, Department of Mathematics 链接:https://arxiv.org/abs/2107.05230 摘要:尽管有几十年的临床研究,脓毒症仍然是一个全球性的公共卫生危机与高死亡率,高发病率。目前,当脓毒症被检测到,并确定潜在的病原体,器官损伤可能已经发展到不可逆转的阶段。因此,有效的脓毒症管理具有高度的时间敏感性。通过系统地分析重症监护室(ICU)大量临床数据的趋势,早期预测败血症可能导致更早的病原鉴定、耐药性检测、有效的抗生素和支持治疗,从而成为挽救生命的措施。在这里,我们开发并验证了一个机器学习(ML)系统,用于预测重症监护病房的败血症。我们的分析代表了迄今为止最大的利用ML预测脓毒症的多国、多中心ICU研究。我们的数据集包含156309美元的独特ICU入院,这代表了来自三个国家的五个大型ICU数据库的完善和协调的子集。使用国际共识的定义脓毒症-3,我们得出每小时解决脓毒症标签注释,总计26734美元($17.1\%$)脓毒症住院。我们将我们的方法,一个深度自我注意模型,与几个临床基线以及ML基线进行了比较,并在数据库内和数据库之间进行了广泛的内部和外部验证。平均而言,我们的模型能够预测败血症,AUROC为$0.847\pm 0.050$(内部样本外验证)和$0.761\pm 0.052$(外部验证)。对于17\%$的统一患病率,在80\%$的召回率下,我们的模型可以提前3.7小时以39\%$的精确度检测脓毒症患者。 摘要:Despite decades of clinical research, sepsis remains a global public health crisis with high mortality, and morbidity. Currently, when sepsis is detected and the underlying pathogen is identified, organ damage may have already progressed to irreversible stages. Effective sepsis management is therefore highly time-sensitive. By systematically analysing trends in the plethora of clinical data available in the intensive care unit (ICU), an early prediction of sepsis could lead to earlier pathogen identification, resistance testing, and effective antibiotic and supportive treatment, and thereby become a life-saving measure. Here, we developed and validated a machine learning (ML) system for the prediction of sepsis in the ICU. Our analysis represents the largest multi-national, multi-centre in-ICU study for sepsis prediction using ML to date. Our dataset contains $156,309$ unique ICU admissions, which represent a refined and harmonised subset of five large ICU databases originating from three countries. Using the international consensus definition Sepsis-3, we derived hourly-resolved sepsis label annotations, amounting to $26,734$ ($17.1\%$) septic stays. We compared our approach, a deep self-attention model, to several clinical baselines as well as ML baselines and performed an extensive internal and external validation within and across databases. On average, our model was able to predict sepsis with an AUROC of $0.847 \pm 0.050$ (internal out-of sample validation) and $0.761 \pm 0.052$ (external validation). For a harmonised prevalence of $17\%$, at $80\%$ recall our model detects septic patients with $39\%$ precision 3.7 hours in advance.

【5】 Remote Blood Oxygen Estimation From Videos Using Neural Networks 标题:基于神经网络的视频远程血氧估计

作者:Joshua Mathew,Xin Tian,Min Wu,Chau-Wai Wong 机构:NC State University, Univeristy of Maryland 链接:https://arxiv.org/abs/2107.05087 摘要:血氧饱和度(SpO$2$)是呼吸功能的重要指标,在COVID-19大流行期间受到越来越多的关注。临床研究结果表明,COVID-19患者在出现任何明显症状之前有可能出现显著的低SpO$2$。摄像机的普及促使研究人员研究利用视频监控SpO$2$的方法。以前大多数涉及智能手机的方案都是基于接触的:它们需要一个指尖来覆盖手机的摄像头和附近的光源来捕捉被照亮的组织重新发出的光。本文提出了第一种基于卷积神经网络的基于智能手机摄像头的非接触式spo2$估计方案。该方案通过分析被试手部的视频进行生理感知,既方便又舒适,既能保护被试的隐私,又可以戴口罩。我们设计了基于SpO$2$测量的光生理模型的神经网络结构,并通过可视化通道组合的权重来证明其可解释性。我们提出的模型优于为基于接触的SpO$2$测量设计的最新模型,显示了我们提出的方法对公共卫生的潜在贡献。我们还分析了皮肤类型和手侧对spo2$估计性能的影响。 摘要:Blood oxygen saturation (SpO$_2$) is an essential indicator of respiratory functionality and is receiving increasing attention during the COVID-19 pandemic. Clinical findings show that it is possible for COVID-19 patients to have significantly low SpO$_2$ before any obvious symptoms. The prevalence of cameras has motivated researchers to investigate methods for monitoring SpO$_2$ using videos. Most prior schemes involving smartphones are contact-based: They require a fingertip to cover the phone's camera and the nearby light source to capture re-emitted light from the illuminated tissue. In this paper, we propose the first convolutional neural network based noncontact SpO$_2$ estimation scheme using smartphone cameras. The scheme analyzes the videos of a participant's hand for physiological sensing, which is convenient and comfortable, and can protect their privacy and allow for keeping face masks on. We design our neural network architectures inspired by the optophysiological models for SpO$_2$ measurement and demonstrate the explainability by visualizing the weights for channel combination. Our proposed models outperform the state-of-the-art model that is designed for contact-based SpO$_2$ measurement, showing the potential of our proposed method to contribute to public health. We also analyze the impact of skin type and the side of a hand on SpO$_2$ estimation performance.

【6】 Prediction of concept lengths for fast concept learning in description logics 标题:描述逻辑中用于快速概念学习的概念长度预测

作者:N'Dah Jean Kouagou,Stefan Heindorf,Caglar Demir,Axel-Cyrille Ngonga Ngomo 机构:Paderborn University 备注:16 pages, 4 figures, 7 tables 链接:https://arxiv.org/abs/2107.04911 摘要:基于精化算子的概念学习方法利用偏序解空间计算概念,并将其作为个体的二元分类模型。然而,对于复杂的学习问题,这些方法所跨越的细化树很容易扩展到数百万个节点。这导致基于细化的方法往往无法有效地检测到最佳概念。在本文中,我们提出了一种有监督的机器学习方法来学习概念长度,它可以预测目标概念的长度,从而有助于减少概念学习过程中的搜索空间。为了实现这一目标,我们比较了四种神经结构,并在四个基准知识图上对它们进行了评估——致癌、突变、语义圣经、家庭基准。我们的评估结果表明,递归神经网络结构在概念长度预测方面表现最好,F-测度高达92%。我们表明,将我们的概念长度预测器集成到CELOE(本体工程类表达式学习器)算法中,可以将CELOE的运行时间提高13.4倍,而不会对其生成的结果的质量产生任何显著的改变。为了重现性,我们在公共GitHub存储库中提供了我们的实现https://github.com/ConceptLengthLearner/ReproducibilityRepo 摘要:Concept learning approaches based on refinement operators explore partially ordered solution spaces to compute concepts, which are used as binary classification models for individuals. However, the refinement trees spanned by these approaches can easily grow to millions of nodes for complex learning problems. This leads to refinement-based approaches often failing to detect optimal concepts efficiently. In this paper, we propose a supervised machine learning approach for learning concept lengths, which allows predicting the length of the target concept and therefore facilitates the reduction of the search space during concept learning. To achieve this goal, we compare four neural architectures and evaluate them on four benchmark knowledge graphs--Carcinogenesis, Mutagenesis, Semantic Bible, Family Benchmark. Our evaluation results suggest that recurrent neural network architectures perform best at concept length prediction with an F-measure of up to 92%. We show that integrating our concept length predictor into the CELOE (Class Expression Learner for Ontology Engineering) algorithm improves CELOE's runtime by a factor of up to 13.4 without any significant changes to the quality of the results it generates. For reproducibility, we provide our implementation in the public GitHub repository at https://github.com/ConceptLengthLearner/ReproducibilityRepo

【7】 Improving Inductive Link Prediction Using Hyper-Relational Facts 标题:利用超关系事实改进归纳链接预测

作者:Mehdi Ali,Max Berrendorf,Mikhail Galkin,Veronika Thost,Tengfei Ma,Volker Tresp,Jens Lehmann 机构: Smart Data Analytics Group, University of Bonn, Germany, Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), and Dresden, Germany, Ludwig-Maximilians-Universit¨at M¨unchen, Munich, Germany, Mila, McGill University 链接:https://arxiv.org/abs/2107.04894 摘要:多年来,知识图上的链接预测一直是一项纯归纳的任务,不允许对看不见的实体进行推理。最近,越来越多的人致力于探索半归纳和完全归纳的情景,从而能够对看不见的和新兴的实体进行推理。尽管如此,所有这些方法都只考虑基于三元组的GLSPL {KK},而它们更丰富的对应关系,超关系KGS(例如WikiTaA),还没有得到适当的研究。在这项工作中,我们分类不同的归纳设置和研究的好处,采用超相关KG在广泛的半和完全归纳的链接预测任务由图形神经网络的最新进展提供动力。我们在一组新的基准测试上的实验表明,类型化边缘上的限定符可以使性能提高6%的绝对增益(对于Hits@10公制)与仅三重基线相比。我们的代码可从\url获得{https://github.com/mali-git/hyper_relational_ilp}. 摘要:For many years, link prediction on knowledge graphs (KGs) has been a purely transductive task, not allowing for reasoning on unseen entities. Recently, increasing efforts are put into exploring semi- and fully inductive scenarios, enabling inference over unseen and emerging entities. Still, all these approaches only consider triple-based \glspl{kg}, whereas their richer counterparts, hyper-relational KGs (e.g., Wikidata), have not yet been properly studied. In this work, we classify different inductive settings and study the benefits of employing hyper-relational KGs on a wide range of semi- and fully inductive link prediction tasks powered by recent advancements in graph neural networks. Our experiments on a novel set of benchmarks show that qualifiers over typed edges can lead to performance improvements of 6% of absolute gains (for the Hits@10 metric) compared to triple-only baselines. Our code is available at \url{https://github.com/mali-git/hyper_relational_ilp}.

【8】 Kernel Mean Estimation by Marginalized Corrupted Distributions 标题:边缘污染分布的核均值估计

作者:Xiaobo Xia,Shuo Shan,Mingming Gong,Nannan Wang,Fei Gao,Haikun Wei,Tongliang Liu 机构:The University of Sydney; ,Southeast University;, The University of Melbourne; ,Xidian University;, Hangzhou Dianzi University 链接:https://arxiv.org/abs/2107.04855 摘要:估计再生核Hilbert空间中的核均值是许多核学习算法的关键。给定一个有限样本,目标核均值的标准估计是经验均值。以往的研究表明,收缩方法可以构造出更好的估计量。在这项工作中,我们提出了一种新的核均值估计方法,称为边缘化核均值估计方法,它在已知分布的噪声下估计核均值。理论上,我们证明了边缘化核均值估计在核均值估计中引入了隐正则化。实验结果表明,边缘化核均值估计得到的估计误差比现有估计要小得多。 摘要:Estimating the kernel mean in a reproducing kernel Hilbert space is a critical component in many kernel learning algorithms. Given a finite sample, the standard estimate of the target kernel mean is the empirical average. Previous works have shown that better estimators can be constructed by shrinkage methods. In this work, we propose to corrupt data examples with noise from known distributions and present a new kernel mean estimator, called the marginalized kernel mean estimator, which estimates kernel mean under the corrupted distribution. Theoretically, we show that the marginalized kernel mean estimator introduces implicit regularization in kernel mean estimation. Empirically, we show on a variety of datasets that the marginalized kernel mean estimator obtains much lower estimation error than the existing estimators.

【9】 Predicting Risk-adjusted Returns using an Asset Independent Regime-switching Model 标题:利用资产独立的制度转换模型预测风险调整后的收益

作者:Nicklas Werge 机构:LPSM, Sorbonne Université, place Jussieu, Paris, France 链接:https://arxiv.org/abs/2107.05535 摘要:随着时间的推移,金融市场往往会在不同的市场制度之间切换,这使得基于平稳性的模型难以为继。基于隐马尔可夫模型,我们构造了一个独立于资产类别的制度转换模型,用于风险调整后的收益预测。这个框架可以区分商品、货币、股票和固定收益市场等多种金融市场的市场制度。该方法利用粘性特征,直接影响制度粘性,从而改变流动水平。通过分析近20年来金融市场的每日变化,对我们的风险调整收益预测指标进行了调查。样本外观察的实证结果可以准确检测牛市、熊市和高波动期,提高风险调整后的回报率,同时保持较好的换手率水平。 摘要:Financial markets tend to switch between various market regimes over time, making stationarity-based models unsustainable. We construct a regime-switching model independent of asset classes for risk-adjusted return predictions based on hidden Markov models. This framework can distinguish between market regimes in a wide range of financial markets such as the commodity, currency, stock, and fixed income market. The proposed method employs sticky features that directly affect the regime stickiness and thereby changing turnover levels. An investigation of our metric for risk-adjusted return predictions is conducted by analyzing daily financial market changes for almost twenty years. Empirical demonstrations of out-of-sample observations obtain an accurate detection of bull, bear, and high volatility periods, improving risk-adjusted returns while keeping a preferable turnover level.

【10】 Deep Risk Model: A Deep Learning Solution for Mining Latent Risk Factors to Improve Covariance Matrix Estimation 标题:深度风险模型:挖掘潜在风险因素改进协方差矩阵估计的深度学习解决方案

作者:Hengxu Lin,Dong Zhou,Weiqing Liu,Jiang Bian 机构:Columbia Business School, New York, United States, Microsoft Research, Beijing, China 链接:https://arxiv.org/abs/2107.05201 摘要:建模和管理投资组合风险可能是实现增长和保持投资业绩的最重要步骤。在建立在Markowitz理论基础上的现代投资组合构建框架中,需要股票收益的协方差矩阵来建模投资组合的风险。传统的估计协方差矩阵的方法都是基于人为设计的风险因素,这往往需要花费大量的时间和精力来设计更好的风险因素来改进协方差估计。在这项工作中,我们将挖掘风险因素的探索描述为一个学习问题,并提出一个深入的学习解决方案来有效地用神经网络“设计”风险因素。精心设定学习目标,以确保所学习的风险因素能够有效地解释股票收益,并具有期望的正交性和稳定性。我们在股票市场数据上的实验证明了该方法的有效性:我们的方法可以获得1.9\%$的高解释方差(R^2$),同时也降低了全局最小方差投资组合的风险。增量分析进一步支持我们的架构和学习目标的设计。 摘要:Modeling and managing portfolio risk is perhaps the most important step to achieve growing and preserving investment performance. Within the modern portfolio construction framework that built on Markowitz's theory, the covariance matrix of stock returns is required to model the portfolio risk. Traditional approaches to estimate the covariance matrix are based on human designed risk factors, which often requires tremendous time and effort to design better risk factors to improve the covariance estimation. In this work, we formulate the quest of mining risk factors as a learning problem and propose a deep learning solution to effectively "design" risk factors with neural networks. The learning objective is carefully set to ensure the learned risk factors are effective in explaining stock returns as well as have desired orthogonality and stability. Our experiments on the stock market data demonstrate the effectiveness of the proposed method: our method can obtain $1.9\%$ higher explained variance measured by $R^2$ and also reduce the risk of a global minimum variance portfolio. Incremental analysis further supports our design of both the architecture and the learning objective.

其他神经网络|深度学习|模型|建模(27篇)

【1】 Forster Decomposition and Learning Halfspaces with Noise 标题:带噪声的半空间的Forster分解与学习

作者:Ilias Diakonikolas,Daniel M. Kane,Christos Tzamos 机构:University of Wisconsin-Madison, University of California, San Diego 链接:https://arxiv.org/abs/2107.05582 摘要:Forster变换是将分布转化为具有良好反集中特性的分布的操作。虽然Forster变换并不总是存在,但是我们证明了任何分布都可以有效地分解为几个分布的不相交混合,Forster变换存在并且可以有效地计算。作为这一结果的主要应用,我们得到了第一个多项式时间算法,用于Massart噪声模型中半空间的分布无关PAC学习,具有强多项式样本复杂度,即独立于样本的比特复杂度。以前的学习算法都是用位复杂度多项式来表示样本复杂度的,尽管这种依赖性在理论上是不必要的。 摘要:A Forster transform is an operation that turns a distribution into one with good anti-concentration properties. While a Forster transform does not always exist, we show that any distribution can be efficiently decomposed as a disjoint mixture of few distributions for which a Forster transform exists and can be computed efficiently. As the main application of this result, we obtain the first polynomial-time algorithm for distribution-independent PAC learning of halfspaces in the Massart noise model with strongly polynomial sample complexity, i.e., independent of the bit complexity of the examples. Previous algorithms for this learning problem incurred sample complexity scaling polynomially with the bit complexity, even though such a dependence is not information-theoretically necessary.

【2】 ROBIN: A Robust Optical Binary Neural Network Accelerator 标题:Robin:一种鲁棒的光学二值神经网络加速器

作者:Febin P. Sunny,Asif Mirza,Mahdi Nikdast,Sudeep Pasricha 机构:Department of Electrical and Computer Engineering, Colorado State University 链接:https://arxiv.org/abs/2107.05530 摘要:与cpu和gpu相比,领域特定的神经网络加速器由于其更高的能量效率和推理性能而备受关注。这样的加速器非常适合于资源受限的嵌入式系统。然而,在这些加速器上映射复杂的神经网络模型仍然需要大量的能量和内存消耗,同时还需要很高的推理时间开销。二值化神经网络(BNNs)利用单比特权值,是在加速器上实现和部署神经网络模型的一种有效方法。本文提出了一种新型的光域BNN加速器ROBIN,它智能地集成了具有互补功能的异质微环谐振器光学器件,有效地实现了BNN的关键功能。我们在光学元件层面上进行详细的制程变异分析,探讨这些元件的有效校正调整,以及集成电路层面的最佳化以对抗热变异。因此,我们提出的ROBIN架构在执行BNN模型时具有健壮、节能、低延迟和高吞吐量等优点。我们的分析表明,ROBIN可以超越最著名的光学BNN加速器和许多电子加速器。具体来说,我们的高能效ROBIN设计比电子BNN加速器的每比特能量值低约4倍,比最近提出的光子BNN加速器的每比特能量值低约933倍,而高性能ROBIN设计比电子和光子BNN加速器的性能分别高约3倍和约25倍。 摘要:Domain specific neural network accelerators have garnered attention because of their improved energy efficiency and inference performance compared to CPUs and GPUs. Such accelerators are thus well suited for resource-constrained embedded systems. However, mapping sophisticated neural network models on these accelerators still entails significant energy and memory consumption, along with high inference time overhead. Binarized neural networks (BNNs), which utilize single-bit weights, represent an efficient way to implement and deploy neural network models on accelerators. In this paper, we present a novel optical-domain BNN accelerator, named ROBIN, which intelligently integrates heterogeneous microring resonator optical devices with complementary capabilities to efficiently implement the key functionalities in BNNs. We perform detailed fabrication-process variation analyses at the optical device level, explore efficient corrective tuning for these devices, and integrate circuit-level optimization to counter thermal variations. As a result, our proposed ROBIN architecture possesses the desirable traits of being robust, energy-efficient, low latency, and high throughput, when executing BNN models. Our analysis shows that ROBIN can outperform the best-known optical BNN accelerators and also many electronic accelerators. Specifically, our energy-efficient ROBIN design exhibits energy-per-bit values that are ~4x lower than electronic BNN accelerators and ~933x lower than a recently proposed photonic BNN accelerator, while a performance-efficient ROBIN design shows ~3x and ~25x better performance than electronic and photonic BNN accelerators, respectively.

【3】 Prequential MDL for Causal Structure Learning with Neural Networks 标题:神经网络因果结构学习的先验MDL

作者:Jorg Bornschein,Silvia Chiappa,Alan Malek,Rosemary Nan Ke 机构:DeepMind, London 链接:https://arxiv.org/abs/2107.05481 摘要:从观测数据中学习贝叶斯网络的结构和因果关系是科学和技术领域的一个共同目标。我们证明了当使用柔性和超参数化的神经网络来模拟观测变量之间的条件概率分布时,序列最小描述长度原理(MDL)可以用来推导贝叶斯网络的实用评分函数。MDL代表了Occam剃刀的一个具体化,我们获得了合理的、简洁的图结构,而不依赖于稀疏诱导先验或其他必须调整的正则化。在经验上,我们证明了合成和真实世界数据的竞争结果。即使变量之间存在强非线性关系,得分也往往能恢复正确的结构;一种情况是先前的方法很难解决,通常会失败。此外,我们还讨论了当观察来自一个正在经历分布转移的来源时,前序得分如何与最近的工作相关,即从适应速度推断因果结构。 摘要:Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology. We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributions between observed variables. MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. Empirically we demonstrate competitive results on synthetic and real-world data. The score often recovers the correct structure even in the presence of strongly nonlinear relationships between variables; a scenario were prior approaches struggle and usually fail. Furthermore we discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift.

【4】 Improving the Algorithm of Deep Learning with Differential Privacy 标题:利用差分隐私改进深度学习算法

作者:Mehdi Amian 机构:INRS-EMT, University of Quebec, Montreal, Canada 链接:https://arxiv.org/abs/2107.05457 摘要:本文针对深度学习模型,提出了一种改进的差分私有随机梯度下降(DPSGD)算法。作为一个动机问题,迄今为止,几乎没有最先进的机器学习算法雇用现有的隐私保护组件,因为在其他方面严重损害其效用,尽管至关重要的必要性。在这项研究的想法是自然的和可解释的,有助于提高效用方面的国家的最先进的。提出的技术的另一个特点是它的简单性,这使得它再次更自然,也更适合于现实世界和特别是商业应用。直觉是出于隐私的原因来修剪和平衡个体之间的巨大差异,同时,为了追求表现而保留相对的个体差异。本文提出的思想也可以应用于递归神经网络(RNN)来解决梯度爆炸问题。将该算法应用于一个分类任务的基准数据集MNIST和CIFAR-10,并计算了效用测度。结果优于原来的工作。 摘要:In this paper, an adjustment to the original differentially private stochastic gradient descent (DPSGD) algorithm for deep learning models is proposed. As a matter of motivation, to date, almost no state-of-the-art machine learning algorithm hires the existing privacy protecting components due to otherwise serious compromise in their utility despite the vital necessity. The idea in this study is natural and interpretable, contributing to improve the utility with respect to the state-of-the-art. Another property of the proposed technique is its simplicity which makes it again more natural and also more appropriate for real world and specially commercial applications. The intuition is to trim and balance out wild individual discrepancies for privacy reasons, and at the same time, to preserve relative individual differences for seeking performance. The idea proposed here can also be applied to the recurrent neural networks (RNN) to solve the gradient exploding problem. The algorithm is applied to benchmark datasets MNIST and CIFAR-10 for a classification task and the utility measure is calculated. The results outperformed the original work.

【5】 PonderNet: Learning to Ponder 标题:PonderNet:学会思考

作者:Andrea Banino,Jan Balaguer,Charles Blundell 机构:DeepMind, London, UK 备注:16 pages, 2 figures, 2 tables, 8th ICML Workshop on Automated Machine Learning (2021) 链接:https://arxiv.org/abs/2107.05407 摘要:在标准的神经网络中,计算量随输入的大小而增长,但不随学习问题的复杂性而增长。为了克服这一限制,我们引入了poundernet,这是一种新的算法,它可以根据手头问题的复杂性来调整计算量。PounderNet端到端地学习计算步骤的数量,以在训练预测精度、计算成本和泛化之间实现有效的折衷。在一个复杂的综合问题上,PounderNet比以前的自适应计算方法大大提高了性能,而且在传统神经网络失败的外推测试中也取得了成功。此外,我们的方法与现实世界问答数据集上的最新结果相匹配,但使用较少的计算量。最后,PounderNet在一项复杂的任务上取得了最新的成果,旨在测试神经网络的推理能力 摘要:In standard neural networks the amount of computation used grows with the size of the inputs, but not with the complexity of the problem being learnt. To overcome this limitation we introduce PonderNet, a new algorithm that learns to adapt the amount of computation based on the complexity of the problem at hand. PonderNet learns end-to-end the number of computational steps to achieve an effective compromise between training prediction accuracy, computational cost and generalization. On a complex synthetic problem, PonderNet dramatically improves performance over previous adaptive computation methods and additionally succeeds at extrapolation tests where traditional neural networks fail. Also, our method matched the current state of the art results on a real world question and answering dataset, but using less compute. Finally, PonderNet reached state of the art results on a complex task designed to test the reasoning capabilities of neural networks.1

【6】 Learning Expected Emphatic Traces for Deep RL 标题:了解深度RL的预期强调痕迹

作者:Ray Jiang,Shangtong Zhang,Veronica Chelu,Adam White,Hado van Hasselt 机构:DeepMind, London, UK, University of Oxford, Oxford, UK, McGill University, Montreal, QC, Canada, Edmonton, Canada 链接:https://arxiv.org/abs/2107.05405 摘要:非策略采样和经验回放是提高样本效率和无标度模型时态差分学习方法的关键。当与函数逼近(如神经网络)相结合时,这种结合被称为致命的三元组,并且具有潜在的不稳定性。近年来的研究表明,将强调加权和多步更新相结合,可以获得稳定和良好的规模性能。然而,这种方法通常仅限于对完整的轨迹进行采样,以计算所需的强调权重。在本文中,我们研究如何结合强调权重与非连续,离线数据采样从重放缓冲区。我们提出了一种可与重放相结合的多步强调加权算法,并提出了一种时间反转的$n$步TD学习算法来学习所需的强调加权。我们证明了这些状态权重与以前的方法相比减少了方差,同时提供了收敛性保证。我们在Atari 2600视频游戏上测试了该方法,并观察到新的X-ETD($n$)代理比基线代理有改进,突出了我们方法的可伸缩性和广泛的适用性。 摘要:Off-policy sampling and experience replay are key for improving sample efficiency and scaling model-free temporal difference learning methods. When combined with function approximation, such as neural networks, this combination is known as the deadly triad and is potentially unstable. Recently, it has been shown that stability and good performance at scale can be achieved by combining emphatic weightings and multi-step updates. This approach, however, is generally limited to sampling complete trajectories in order, to compute the required emphatic weighting. In this paper we investigate how to combine emphatic weightings with non-sequential, off-line data sampled from a replay buffer. We develop a multi-step emphatic weighting that can be combined with replay, and a time-reversed $n$-step TD learning algorithm to learn the required emphatic weighting. We show that these state weightings reduce variance compared with prior approaches, while providing convergence guarantees. We tested the approach at scale on Atari 2600 video games, and observed that the new X-ETD($n$) agent improved over baseline agents, highlighting both the scalability and broad applicability of our approach.

【7】 A Flexible Multi-Task Model for BERT Serving 标题:一种面向BERT服务的柔性多任务模型

作者:Tianwen Wei,Jianwei Qi,Shenghuan He 机构:Xiaomi AI 链接:https://arxiv.org/abs/2107.05377 摘要:在这个演示中,我们提出了一个高效的基于BERT的多任务(MT)框架,特别适合任务的迭代和增量开发。该框架基于局部微调的思想,即只对BERT的顶层进行微调,而对其他层进行冻结。对于每个任务,我们使用部分微调独立地训练单个任务(ST)模型。然后利用知识提取技术对每个ST模型中的任务特定层进行压缩。最后将压缩后的ST模型合并为一个MT模型,使前者的冻结层在任务间共享。我们在八个粘合任务上举例说明了我们的方法,证明它能够实现强大的性能和效率。我们已经在小米公司开发的商业人工智能助手小艾的话语理解系统中实现了我们的方法。我们估计,我们的模式降低了86%的整体服务成本。 摘要:In this demonstration, we present an efficient BERT-based multi-task (MT) framework that is particularly suitable for iterative and incremental development of the tasks. The proposed framework is based on the idea of partial fine-tuning, i.e. only fine-tune some top layers of BERT while keep the other layers frozen. For each task, we train independently a single-task (ST) model using partial fine-tuning. Then we compress the task-specific layers in each ST model using knowledge distillation. Those compressed ST models are finally merged into one MT model so that the frozen layers of the former are shared across the tasks. We exemplify our approach on eight GLUE tasks, demonstrating that it is able to achieve both strong performance and efficiency. We have implemented our method in the utterance understanding system of XiaoAI, a commercial AI assistant developed by Xiaomi. We estimate that our model reduces the overall serving cost by 86%.

【8】 Nonparametric Regression with Shallow Overparameterized Neural Networks Trained by GD with Early Stopping 标题:GD提前停止训练浅层过参数神经网络的非参数回归

作者:Ilja Kuzborskij,Csaba Szepesvári 机构:DeepMind, London, Csaba Szepesv´ari, DeepMind, Canada and University of Alberta, Edmonton 备注:COLT 2021 链接:https://arxiv.org/abs/2107.05341 摘要:研究了梯度下降(GD)训练过参数化浅层神经网络学习Lipschitz回归函数的能力。为了避免在有噪声标签的情况下,训练到几乎零训练误差的神经网络在该类上不一致的问题,我们提出了一个允许我们显示最优速率的提前停止规则。这提供了Hu等人(2021年)研究$\ell 2$-正则化GD在非参数回归中训练浅层网络的性能的另一种结果,该方法完全依赖于无限宽网络(神经切线核(NTK))近似。在这里,我们提出了一个简单的分析,它是基于输入空间的划分参数(如1-最近邻规则的情况),再加上训练的神经网络在GD训练时对其输入是平滑的。在无噪声的情况下,证明不依赖于任何核化,可以看作是有限宽度的结果。在标签噪声的情况下,通过稍微修改校样,使用Yao、Rosasco和Caponnetto(2007)的技术来控制噪声。 摘要:We explore the ability of overparameterized shallow neural networks to learn Lipschitz regression functions with and without label noise when trained by Gradient Descent (GD). To avoid the problem that in the presence of noisy labels, neural networks trained to nearly zero training error are inconsistent on this class, we propose an early stopping rule that allows us to show optimal rates. This provides an alternative to the result of Hu et al. (2021) who studied the performance of $\ell 2$ -regularized GD for training shallow networks in nonparametric regression which fully relied on the infinite-width network (Neural Tangent Kernel (NTK)) approximation. Here we present a simpler analysis which is based on a partitioning argument of the input space (as in the case of 1-nearest-neighbor rule) coupled with the fact that trained neural networks are smooth with respect to their inputs when trained by GD. In the noise-free case the proof does not rely on any kernelization and can be regarded as a finite-width result. In the case of label noise, by slightly modifying the proof, the noise is controlled using a technique of Yao, Rosasco, and Caponnetto (2007).

【9】 Learning interaction rules from multi-animal trajectories via augmented behavioral models 标题:通过增广行为模型从多动物轨迹中学习交互规则

作者:Keisuke Fujii,Naoya Takeishi,Kazushi Tsutsui,Emyo Fujioka,Nozomi Nishiumi,Ryooya Tanaka,Mika Fukushiro,Kaoru Ide,Hiroyoshi Kohno,Ken Yoda,Susumu Takahashi,Shizuko Hiryu,Yoshinobu Kawahara 机构:Nagoya University, RIKEN Center for Advanced Intelligence Project, JST PRESTO, University of Applied Sciences, and Arts Western Switzerland, Doshisha University, National Institute, for Basic Biology, Tokai University, Kyushu University 备注:22 pages, 4 figures 链接:https://arxiv.org/abs/2107.05326 摘要:从运动序列中提取生物制剂的相互作用规律在各个领域都是一个挑战。格兰杰因果关系是一个实用的框架,分析相互作用的观测时间序列数据;然而,这一框架忽视了动物行为中的生成过程结构,这可能导致解释问题,有时还可能导致对因果关系的错误评估。在本文中,我们提出了一个新的框架,学习格兰杰因果关系从多动物的轨迹,通过增强理论为基础的行为模型与解释数据驱动模型。我们采用神经网络来扩充时变动态系统所描述的不完全多智能体行为模型。为了有效和可解释的学习,我们的模型利用了分离导航和运动过程的基于理论的架构,以及用于可靠行为建模的理论指导的正则化。这可以提供随时间变化的格兰杰因果效应的可解释迹象,即当特定的其他因素导致接近或分离时。在使用合成数据集的实验中,我们的方法取得了比各种基线更好的性能。然后,我们分析了小鼠、苍蝇、鸟类和蝙蝠的多种动物数据集,验证了我们的方法并获得了新的生物学见解。 摘要:Extracting the interaction rules of biological agents from moving sequences pose challenges in various domains. Granger causality is a practical framework for analyzing the interactions from observed time-series data; however, this framework ignores the structures of the generative process in animal behaviors, which may lead to interpretational problems and sometimes erroneous assessments of causality. In this paper, we propose a new framework for learning Granger causality from multi-animal trajectories via augmented theory-based behavioral models with interpretable data-driven models. We adopt an approach for augmenting incomplete multi-agent behavioral models described by time-varying dynamical systems with neural networks. For efficient and interpretable learning, our model leverages theory-based architectures separating navigation and motion processes, and the theory-guided regularization for reliable behavioral modeling. This can provide interpretable signs of Granger-causal effects over time, i.e., when specific others cause the approach or separation. In experiments using synthetic datasets, our method achieved better performance than various baselines. We then analyzed multi-animal datasets of mice, flies, birds, and bats, which verified our method and obtained novel biological insights.

【10】 HEMP: High-order Entropy Minimization for neural network comPression 标题:HEMP:神经网络压缩的高阶熵最小化算法

作者:Enzo Tartaglione,Stéphane Lathuilière,Attilio Fiandrotti,Marco Cagnazzo,Marco Grangetto 机构:University of Torino, Torino, Italy, T´el´ecom Paris, Paris, France 链接:https://arxiv.org/abs/2107.05298 摘要:我们将量化人工神经网络的熵表示为一个可微函数,可以作为正则项插入到通过梯度下降最小化的代价函数中。我们的公式有效地扩展到一阶以上,并且对量子化方案是不可知的。然后对网络进行训练,使量化参数的熵最小化,从而通过熵编码对量化参数进行最优压缩。我们用我们的熵公式在多个数据集上对已知的网络结构进行量化和压缩。我们的方法优于类似的方法,享受高阶熵估计的好处,显示出对非均匀量化的灵活性(我们使用Lloyd max量化),对任何熵阶的可伸缩性最小化和压缩方面的效率。我们表明,HEMP能够与其他旨在修剪或量化模型本身的方法协同工作,在不损害模型性能的情况下,在存储大小可压缩性方面提供显著的好处。 摘要:We formulate the entropy of a quantized artificial neural network as a differentiable function that can be plugged as a regularization term into the cost function minimized by gradient descent. Our formulation scales efficiently beyond the first order and is agnostic of the quantization scheme. The network can then be trained to minimize the entropy of the quantized parameters, so that they can be optimally compressed via entropy coding. We experiment with our entropy formulation at quantizing and compressing well-known network architectures over multiple datasets. Our approach compares favorably over similar methods, enjoying the benefits of higher order entropy estimate, showing flexibility towards non-uniform quantization (we use Lloyd-max quantization), scalability towards any entropy order to be minimized and efficiency in terms of compression. We show that HEMP is able to work in synergy with other approaches aiming at pruning or quantizing the model itself, delivering significant benefits in terms of storage size compressibility without harming the model's performance.

【11】 OmniLytics: A Blockchain-based Secure Data Market for Decentralized Machine Learning 标题:OmniLytics:一个基于区块链的分布式机器学习安全数据市场

作者:Jiacheng Liang,Wensi Jiang,Songze Li 机构: 1Hong Kong University of Science and Technology, China 2University of Electronic Science and Technologyof China, China 3Central South University 备注:12 pages,5 figures, accepted by International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021(this http URL) 链接:https://arxiv.org/abs/2107.05252 摘要:我们提出OmniLytics,一个基于区块链的安全数据交易市场,用于机器学习应用。利用OmniLytics,许多分布式数据拥有者可以贡献他们的私有数据来共同训练一些模型拥有者所要求的ML模型,并获得对数据贡献的补偿。OmniLytics支持这样的模型训练,同时提供1)模型安全性,防止好奇的数据所有者;2) 针对模型和数据所有者的数据安全;3) 对恶意数据所有者提供错误结果以毒害模型训练的恢复能力;4)对恶意模型所有者意图逃避付款的恢复能力。OmniLytics作为智能合约在以太坊区块链上实现,以保证支付的原子性。在OmniLytics中,模型所有者在契约上发布加密的初始模型,在该模型上参与的数据所有者使用其私有数据计算梯度,并通过契约安全地聚合梯度。最后,契约补偿数据所有者,模型所有者解密聚合模型更新。我们在以太坊上实现了OmniLytics的工作原型,并对其在各种参数组合下的气体消耗和执行时间进行了大量的实验测量,证明了其计算量大、成本低、实用性强。 摘要:We propose OmniLytics, a blockchain-based secure data trading marketplace for machine learning applications. Utilizing OmniLytics, many distributed data owners can contribute their private data to collectively train a ML model requested by some model owners, and get compensated for data contribution. OmniLytics enables such model training while simultaneously providing 1) model security against curious data owners; 2) data security against curious model and data owners; 3) resilience to malicious data owners who provide faulty results to poison model training; and 4) resilience to malicious model owner who intents to evade the payment. OmniLytics is implemented as a smart contract on the Ethereum blockchain to guarantee the atomicity of payment. In OmniLytics, a model owner publishes encrypted initial model on the contract, over which the participating data owners compute gradients using their private data, and securely aggregate the gradients through the contract. Finally, the contract reimburses the data owners, and the model owner decrypts the aggregated model update. We implement a working prototype of OmniLytics on Ethereum, and perform extensive experiments to measure its gas cost and execution time under various parameter combinations, demonstrating its high computation and cost efficiency and strong practicality.

【12】 Dual Training of Energy-Based Models with Overparametrized Shallow Neural Networks 标题:基于能量模型的超参数化浅层神经网络的双重训练

作者:Carles Domingo-Enrich,Alberto Bietti,Marylou Gabrié,Joan Bruna,Eric Vanden-Eijnden 机构:Vanden-Eijndena, Courant Institute of Mathematical Sciences, New York University, Center for Data Science, New York University, Center for Computational Mathematics, Flatiron Institute 链接:https://arxiv.org/abs/2107.05134 摘要:基于能量的模型(EBMs)是一种生成模型,通常通过极大似然估计进行训练。由于需要对与能量相关的吉布斯分布进行采样,这种方法在训练能量非凸的一般情况下具有挑战性。利用一般的Fenchel对偶结果,我们导出了在主动(又称特征学习)和惰性状态下,具有浅超参数化神经网络能量的对偶到极大似然EBMs的变分原理。在主动状态下,这种对偶形式导致了一种训练算法,其中一个同时更新样本空间中的粒子和能量参数空间中的神经元。我们还考虑了该算法的一种变体,其中粒子有时在从数据集抽取的随机样本中重新启动,并且表明在每次迭代步骤中执行这些重启对应于得分匹配训练。在我们的对偶算法中使用中间参数设置,从而提供了一种在最大似然和分数匹配训练之间插值的方法。这些结果在简单的数值实验中得到了说明。 摘要:Energy-based models (EBMs) are generative models that are usually trained via maximum likelihood estimation. This approach becomes challenging in generic situations where the trained energy is nonconvex, due to the need to sample the Gibbs distribution associated with this energy. Using general Fenchel duality results, we derive variational principles dual to maximum likelihood EBMs with shallow overparametrized neural network energies, both in the active (aka feature-learning) and lazy regimes. In the active regime, this dual formulation leads to a training algorithm in which one updates concurrently the particles in the sample space and the neurons in the parameter space of the energy. We also consider a variant of this algorithm in which the particles are sometimes restarted at random samples drawn from the data set, and show that performing these restarts at every iteration step corresponds to score matching training. Using intermediate parameter setups in our dual algorithm thereby gives a way to interpolate between maximum likelihood and score matching training. These results are illustrated in simple numerical experiments.

【13】 Machine Learning Challenges and Opportunities in the African Agricultural Sector -- A General Perspective 标题:机器学习在非洲农业部门的挑战和机遇--总体观点

作者:Racine Ly 机构:AKADEMIYA, Kigali, Rwanda 备注:This paper has been submitted as an internal discussion paper at AKADEMIYA2063. It has 13 pages and contains 4 images and 2 tables 链接:https://arxiv.org/abs/2107.05101 摘要:计算机能力的提高、算法技术的进步以及可用数据的显著增加促成了人工智能技术的最新发展。它的一个分支,叫做机器学习(ML),在模仿人类智能的特征方面表现出很强的能力,比如视觉、语言和解决问题的能力。然而,正如以前的技术革命所表明的那样,它们最显著的影响可能主要发生在非传统技术使用者的其他部门。农业部门对非洲经济至关重要;在气候变化时代,提高产量、减少损失和有效管理自然资源至关重要。机器学习是一种具有预测附加值的技术,因此有可能减少跨部门(在本例中是农业部门)的不确定性和风险。本文的目的是背景和讨论障碍ML为基础的解决方案,为非洲农业。在第二部分中,我们从历史和技术的角度概述了ML技术及其主要驱动力。在第三部分中,我们简要回顾了目前ML在农业中的应用。最后,在第4节中,我们讨论了对非洲日益增长的ML兴趣以及在农业部门创建和使用基于ML的解决方案的潜在障碍。 摘要:The improvement of computers' capacities, advancements in algorithmic techniques, and the significant increase of available data have enabled the recent developments of Artificial Intelligence (AI) technology. One of its branches, called Machine Learning (ML), has shown strong capacities in mimicking characteristics attributed to human intelligence, such as vision, speech, and problem-solving. However, as previous technological revolutions suggest, their most significant impacts could be mostly expected on other sectors that were not traditional users of that technology. The agricultural sector is vital for African economies; improving yields, mitigating losses, and effective management of natural resources are crucial in a climate change era. Machine Learning is a technology with an added value in making predictions, hence the potential to reduce uncertainties and risk across sectors, in this case, the agricultural sector. The purpose of this paper is to contextualize and discuss barriers to ML-based solutions for African agriculture. In the second section, we provided an overview of ML technology from a historical and technical perspective and its main driving force. In the third section, we provided a brief review of the current use of ML in agriculture. Finally, in section 4, we discuss ML growing interest in Africa and the potential barriers to creating and using ML-based solutions in the agricultural sector.

【14】 SE-PSNet: Silhouette-based Enhancement Feature for Panoptic Segmentation Network 标题:SE-PSNet:基于轮廓的全光分割网络增强特征

作者:Shuo-En Chang,Yi-Cheng Yang,En-Ting Lin,Pei-Yung Hsiao,Li-Chen Fu 机构:Department of Computer Science and Information Engineering, National Taiwan University, Taipei, Taiwan, Graduate Institute of Networking and Multimedia, National Taiwan University, Taipei, Taiwan 备注:Technical report 链接:https://arxiv.org/abs/2107.05093 摘要:近年来,出现了一种将语义分割和实例分割相结合的全景图像分割方法,该方法的目标是用对应的实例ID对每个像素进行分类,本文提出了一种解决方案。整体结构采用自下而上和自上而下相结合的方法。因此,不仅可以有更好的性能,而且可以保持执行速度。网络主要关注口罩的质量。在以前的工作中,我们可以看到物体轮廓的不均匀性更容易出现,从而导致低质量的预测。因此,我们提出了增强特征和相应的损失函数的轮廓对象和背景,以改善掩模。同时,利用新提出的置信度来解决遮挡问题,使网络趋向于使用更高质量的掩模作为预测结果。为了验证我们的研究,我们使用COCO数据集和CityScapes数据集进行了实验,得到了推理速度快的竞争结果。 摘要:Recently, there has been a panoptic segmentation task combining semantic and instance segmentation, in which the goal is to classify each pixel with the corresponding instance ID. In this work, we propose a solution to tackle the panoptic segmentation task. The overall structure combines the bottom-up method and the top-down method. Therefore, not only can there be better performance, but also the execution speed can be maintained. The network mainly pays attention to the quality of the mask. In the previous work, we can see that the uneven contour of the object is more likely to appear, resulting in low-quality prediction. Accordingly, we propose enhancement features and corresponding loss functions for the silhouette of objects and backgrounds to improve the mask. Meanwhile, we use the new proposed confidence score to solve the occlusion problem and make the network tend to use higher quality masks as prediction results. To verify our research, we used the COCO dataset and CityScapes dataset to do experiments and obtained competitive results with fast inference time.

【15】 Machine Learning based CVD Virtual Metrology in Mass Produced Semiconductor Process 标题:基于机器学习的大规模生产半导体CVD虚拟测量

作者:Yunsong Xie,Ryan Stearrett 机构:Samsung Austin Semiconductor Company, Austin, TX, USA 链接:https://arxiv.org/abs/2107.05071 摘要:在基于机器学习的化学气相沉积(CVD)虚拟计量(VM)中,对数据输入、特征选择和回归算法三个关键方面进行了交叉测试。结果表明,线性特征选择回归算法对虚拟机数据的拟合程度普遍偏低。为了获得更高的预测精度,数据输入也是必要的,因为当获得最佳精度时,数据可用性仅为~70%。本文提出了一种非线性特征选择回归算法,结合最近邻数据插补算法,预测精度可达0.7。这将导致70%的CVD工艺变化减少,这被认为将导致物理计量的频率降低,以及更可靠的大规模生产的晶圆质量提高。 摘要:A cross-benchmark has been done on three critical aspects, data imputing, feature selection and regression algorithms, for machine learning based chemical vapor deposition (CVD) virtual metrology (VM). The result reveals that linear feature selection regression algorithm would extensively under-fit the VM data. Data imputing is also necessary to achieve a higher prediction accuracy as the data availability is only ~70% when optimal accuracy is obtained. This work suggests a nonlinear feature selection and regression algorithm combined with nearest data imputing algorithm would provide a prediction accuracy as high as 0.7. This would lead to 70% reduced CVD processing variation, which is believed to will lead to reduced frequency of physical metrology as well as more reliable mass-produced wafer with improved quality.

【16】 Learning from Crowds with Sparse and Imbalanced Annotations 标题:从具有稀疏和不平衡注释的人群中学习

作者:Ye Shi,Shao-Yuan Li,Sheng-Jun Huang 机构:Ministry of Industry and Information Technology Key Laboratory of Pattern Analysis and Machine, Intelligence College of Computer Science and Technology, Nanjing University of Aeronautics and, Astronautics, Nanjing, China 链接:https://arxiv.org/abs/2107.05039 摘要:传统的有监督学习需要对训练数据进行地面真值标注,在很多情况下很难对训练数据进行采集。最近,众包通过求助于非专家群体而成为一种有效的标签解决方案。为了减少标记错误的影响,一种常见的做法是将每个实例分发给多个worker,而每个worker只注释数据的一个子集,从而导致{\it sparse annotation}现象。在本文中,我们注意到当遇到类不平衡时,即当地面真值标签是{\it class unbalanced}时,稀疏注释容易出现偏态分布,从而严重影响学习算法。为了解决这个问题,我们提出了一种基于自我训练的方法{\it self Crowd},通过逐步添加自信的伪注释和重新平衡注释分布。具体地说,我们提出了一种基于分布感知的置信度度量来选择置信伪注释,该度量采用重采样策略对少数注释进行过采样,对多数注释进行欠采样。在一个实际的众包图像分类任务中,我们证明了该方法在整个训练过程中比分布无关方法产生了更均衡的标注,并且在不同标注稀疏度水平下显著提高了学习性能。 摘要:Traditional supervised learning requires ground truth labels for the training data, whose collection can be difficult in many cases. Recently, crowdsourcing has established itself as an efficient labeling solution through resorting to non-expert crowds. To reduce the labeling error effects, one common practice is to distribute each instance to multiple workers, whereas each worker only annotates a subset of data, resulting in the {\it sparse annotation} phenomenon. In this paper, we note that when meeting with class-imbalance, i.e., when the ground truth labels are {\it class-imbalanced}, the sparse annotations are prone to be skewly distributed, which thus can severely bias the learning algorithm. To combat this issue, we propose one self-training based approach named {\it Self-Crowd} by progressively adding confident pseudo-annotations and rebalancing the annotation distribution. Specifically, we propose one distribution aware confidence measure to select confident pseudo-annotations, which adopts the resampling strategy to oversample the minority annotations and undersample the majority annotations. On one real-world crowdsourcing image classification task, we show that the proposed method yields more balanced annotations throughout training than the distribution agnostic methods and substantially improves the learning performance at different annotation sparsity levels.

【17】 Blending Pruning Criteria for Convolutional Neural Networks 标题:卷积神经网络的混合剪枝准则

作者:Wei He,Zhongzhan Huang,Mingfu Liang,Senwei Liang,Haizhao Yang 机构:Yang, Nanyang Technological University , Tsinghua University, Northwestern University , Purdue University 链接:https://arxiv.org/abs/2107.05033 摘要:卷积神经网络(CNNs)在各种视觉应用中的进展引起了人们的广泛关注。然而,大多数cnn无法满足实际部署的严格要求。为了克服这个问题,最近流行的网络剪枝是一种减少模型冗余的有效方法。但是,根据过滤器在不同修剪标准上的“重要性”对过滤器进行排序可能不一致。根据某个标准,一个过滤器可能是重要的,而根据另一个标准,它是不必要的,这表明每个标准只是综合“重要性”的一个局部视图。基于这个动机,我们提出了一个新的框架来整合现有的过滤器剪枝标准,探索标准的多样性。该框架包括两个阶段:准则聚类和过滤器重要性校正。首先,根据“重要性”得分的排序,通过分层聚类来压缩剪枝准则。第二,在每个聚类中,我们提出一个校正因子来调整每个候选混合样本的显著性,并通过进化算法来搜索最优混合准则。在CIFAR-100和ImageNet基准上的定量结果表明,我们的框架优于最先进的基线,在剪枝之后重新升级到紧凑的模型性能。 摘要:The advancement of convolutional neural networks (CNNs) on various vision applications has attracted lots of attention. Yet the majority of CNNs are unable to satisfy the strict requirement for real-world deployment. To overcome this, the recent popular network pruning is an effective method to reduce the redundancy of the models. However, the ranking of filters according to their "importance" on different pruning criteria may be inconsistent. One filter could be important according to a certain criterion, while it is unnecessary according to another one, which indicates that each criterion is only a partial view of the comprehensive "importance". From this motivation, we propose a novel framework to integrate the existing filter pruning criteria by exploring the criteria diversity. The proposed framework contains two stages: Criteria Clustering and Filters Importance Calibration. First, we condense the pruning criteria via layerwise clustering based on the rank of "importance" score. Second, within each cluster, we propose a calibration factor to adjust their significance for each selected blending candidates and search for the optimal blending criterion via Evolutionary Algorithm. Quantitative results on the CIFAR-100 and ImageNet benchmarks show that our framework outperforms the state-of-the-art baselines, regrading to the compact model performance after pruning.

【18】 Towards a Multimodal System for Precision Agriculture using IoT and Machine Learning 标题:基于物联网和机器学习的精准农业多式联运系统

作者:Satvik Garg,Pradyumn Pundir,Himanshu Jindal,Hemraj Saini,Somya Garg 机构:Jaypee University of Information Technology, Solan, India, Deloitte Consulting LLP, New York, USA 备注:7 pages, this paper is accepted in the 12th ICCCNT 2021 conference at IIT Kharagpur, India. The final version of this paper will appear in the conference proceedings 链接:https://arxiv.org/abs/2107.04895 摘要:精确农业系统是一个新兴的概念,它是指利用当前的信息和通信技术来监督农场,以提高产量的数量和质量,同时推进所需的人力劳动。自动化要求传感器对土壤、水分、光照、湿度、温度等信息进行分类,为操作者提供准确的数据,使农民获得优质的产量。在这项工作中,提出了一项研究,包括所有国家的最先进的方法,精确农业使用。采用物联网(IoT)等技术进行数据采集,机器学习用于作物损伤预测,深度学习用于作物病害检测。使用物联网收集的数据负责测量智能灌溉的水分水平,估算肥料的氮、磷、钾,以实现最佳产量发展。在作物危害预测方面,采用了随机森林(RF)、光梯度增强机(LGBM)、XGBoost(XGB)、决策树(DT)和K近邻(KNN)等算法。随后,预先训练的卷积神经网络(CNN)模型,如VGG16、Resnet50和DenseNet121也被训练来检查作物是否受到某种疾病的污染。 摘要:Precision agriculture system is an arising idea that refers to overseeing farms utilizing current information and communication technologies to improve the quantity and quality of yields while advancing the human work required. The automation requires the assortment of information given by the sensors such as soil, water, light, humidity, temperature for additional information to furnish the operator with exact data to acquire excellent yield to farmers. In this work, a study is proposed that incorporates all common state-of-the-art approaches for precision agriculture use. Technologies like the Internet of Things (IoT) for data collection, machine Learning for crop damage prediction, and deep learning for crop disease detection is used. The data collection using IoT is responsible for the measure of moisture levels for smart irrigation, n, p, k estimations of fertilizers for best yield development. For crop damage prediction, various algorithms like Random Forest (RF), Light gradient boosting machine (LGBM), XGBoost (XGB), Decision Tree (DT) and K Nearest Neighbor (KNN) are used. Subsequently, Pre-Trained Convolutional Neural Network (CNN) models such as VGG16, Resnet50, and DenseNet121 are also trained to check if the crop was tainted with some illness or not.

【19】 HOMRS: High Order Metamorphic Relations Selector for Deep Neural Networks 标题:HOMRS:深度神经网络的高阶变质关系选择器

作者:Florian Tambon,Giulio Antoniol,Foutse Khomh 备注:19 pages, 2 figures 链接:https://arxiv.org/abs/2107.04863 摘要:深度神经网络(DNN)的应用正日益成为我们日常生活的一部分,从医疗应用到自动驾驶汽车。DNN的传统验证依赖于精确性度量,然而,对抗性实例的存在突出了这些精确性度量的局限性,特别是当DNN被集成到安全关键系统时,引起了关注。在本文中,我们提出了HOMRS,一种通过从基本变质关系的初始集合自动建立一个小的优化的高阶变质关系集合来促进变质测试的方法。HOMRS的主干是多目标搜索;它利用了传统系统测试的思想,如代码覆盖率、测试用例和路径多样性。我们使用MNIST数据集将HOMRS应用于LeNet5 DNN,我们报告了证据表明它构建了一个小但有效的高阶转换集,实现了95%的杀死率。五个评分员在高阶变换前后手动标记一个图像池;Fleiss'Kappa和统计检验证实它们是变质性质。HOMRS的内在关系也能有效地对抗对抗性或非分配性的例子;HOMRS检测到92%的随机抽样分布外的图像。HOMRS变换也适用于在线实时使用。 摘要:Deep Neural Networks (DNN) applications are increasingly becoming a part of our everyday life, from medical applications to autonomous cars. Traditional validation of DNN relies on accuracy measures, however, the existence of adversarial examples has highlighted the limitations of these accuracy measures, raising concerns especially when DNN are integrated into safety-critical systems. In this paper, we present HOMRS, an approach to boost metamorphic testing by automatically building a small optimized set of high order metamorphic relations from an initial set of elementary metamorphic relations. HOMRS' backbone is a multi-objective search; it exploits ideas drawn from traditional systems testing such as code coverage, test case, and path diversity. We applied HOMRS to LeNet5 DNN with MNIST dataset and we report evidence that it builds a small but effective set of high order transformations achieving a 95% kill ratio. Five raters manually labeled a pool of images before and after high order transformation; Fleiss' Kappa and statistical tests confirmed that they are metamorphic properties. HOMRS built-in relations are also effective to confront adversarial or out-of-distribution examples; HOMRS detected 92% of randomly sampled out-of-distribution images. HOMRS transformations are also suitable for online real-time use.

【20】 Is a Single Model Enough? MuCoS: A Multi-Model Ensemble Learning for Semantic Code Search 标题:一个型号够了吗?MuCoS:一种面向语义码搜索的多模型集成学习

作者:Lun Du,Xiaozhou Shi,Yanlin Wang,Ensheng Shi,Shi Han,Dongmei Zhang 机构:Microsoft Research Asia, Beijing, China, Beijing University of Technology, Xi’an Jiaotong University 备注:5 pages 链接:https://arxiv.org/abs/2107.04773 摘要:近年来,深度学习方法由于能够更好地捕捉代码片段和搜索查询之间的语义关联,并且具有良好的性能,成为代码搜索的主流。然而,代码片段具有来自不同维度的不同信息,如业务逻辑、特定算法和硬件通信,因此单个代码表示模块很难涵盖所有方面。另一方面,由于一个特定的查询可能集中在一个或多个视角上,单个查询表示模块很难表示不同的用户意图。本文提出了一种用于语义代码搜索的多模型集成学习体系结构MuCoS。它结合了几个单独的学习者,每个学习者都强调代码片段的特定视角。我们在不同的数据集上训练个体学习者,这些数据集包含不同的代码信息,并且我们使用数据扩充策略来获得这些不同的数据集。然后,我们整合学习者来捕捉代码片段的综合特征。 摘要:Recently, deep learning methods have become mainstream in code search since they do better at capturing semantic correlations between code snippets and search queries and have promising performance. However, code snippets have diverse information from different dimensions, such as business logic, specific algorithm, and hardware communication, so it is hard for a single code representation module to cover all the perspectives. On the other hand, as a specific query may focus on one or several perspectives, it is difficult for a single query representation module to represent different user intents. In this paper, we propose MuCoS, a multi-model ensemble learning architecture for semantic code search. It combines several individual learners, each of which emphasizes a specific perspective of code snippets. We train the individual learners on different datasets which contain different perspectives of code information, and we use a data augmentation strategy to get these different datasets. Then we ensemble the learners to capture comprehensive features of code snippets.

【21】 Hack The Box: Fooling Deep Learning Abstraction-Based Monitors 标题:破解盒子:愚弄深度学习基于抽象的监视器

作者:Sara Hajj Ibrahim,Mohamed Nassar 机构: American University of Beirut (AUB), University of New Haven 链接:https://arxiv.org/abs/2107.04764 摘要:深度学习是一种适应概念层次结构的机器学习。深度学习分类器将输入层概念的最基本版本链接到输出层概念的最抽象版本,也称为类或标签。然而,一旦在一组有限的课程中进行训练,深度学习模型就没有能力说一个给定的输入不属于任何一个课程,而且根本无法链接。不相关类预测的正确失效是一个具有挑战性的问题,在文献中有许多方法被解决。新颖性检测使深度学习能够为新奇/看不见的类输出“不知道”。尽管如此,对于新颖性检测的安全方面还没有给予关注。在本文中,我们考虑基于抽象的新颖性检测的案例研究表明,它是不利于对抗样本。此外,我们还证明了利用对抗性样本欺骗深度学习分类器,同时绕过新颖性检测监控的可行性。换句话说,这些监控箱是可以被黑客攻击的。我们证明了新颖性检测本身最终是一个攻击面。 摘要:Deep learning is a type of machine learning that adapts a deep hierarchy of concepts. Deep learning classifiers link the most basic version of concepts at the input layer to the most abstract version of concepts at the output layer, also known as a class or label. However, once trained over a finite set of classes, a deep learning model does not have the power to say that a given input does not belong to any of the classes and simply cannot be linked. Correctly invalidating the prediction of unrelated classes is a challenging problem that has been tackled in many ways in the literature. Novelty detection gives deep learning the ability to output "do not know" for novel/unseen classes. Still, no attention has been given to the security aspects of novelty detection. In this paper, we consider the case study of abstraction-based novelty detection and show that it is not robust against adversarial samples. Moreover, we show the feasibility of crafting adversarial samples that fool the deep learning classifier and bypass the novelty detection monitoring at the same time. In other words, these monitoring boxes are hackable. We demonstrate that novelty detection itself ends up as an attack surface.

【22】 Multi-Agent Imitation Learning with Copulas 标题:基于Copulas的多Agent模拟学习

作者:Hongwei Wang,Lantao Yu,Zhangjie Cao,Stefano Ermon 机构:Computer Science Department, Stanford University, Stanford, CA , USA 备注:ECML-PKDD 2021. First two authors contributed equally 链接:https://arxiv.org/abs/2107.04750 摘要:多智能体模仿学习旨在通过学习观察和行为之间的映射来训练多智能体从演示中执行任务,这对于理解物理、社会和团队游戏系统至关重要。然而,大多数现有的多智能体交互建模工作通常假设智能体根据自己的观察做出独立的决策,而忽略了智能体之间复杂的依赖关系。在本文中,我们建议使用copula,一个强大的统计工具来捕捉随机变量之间的依赖关系,显式地建模多agent系统中的相关性和协调性。我们提出的模型能够分别学习捕获每个个体的局部行为模式的边缘词,以及单独和完全捕获个体间依赖结构的copula函数。在合成数据集和真实数据集上的大量实验表明,我们的模型在动作预测任务的各种场景中都优于最新的基线,并且能够生成接近专家演示的新轨迹。 摘要:Multi-agent imitation learning aims to train multiple agents to perform tasks from demonstrations by learning a mapping between observations and actions, which is essential for understanding physical, social, and team-play systems. However, most existing works on modeling multi-agent interactions typically assume that agents make independent decisions based on their observations, ignoring the complex dependence among agents. In this paper, we propose to use copula, a powerful statistical tool for capturing dependence among random variables, to explicitly model the correlation and coordination in multi-agent systems. Our proposed model is able to separately learn marginals that capture the local behavioral patterns of each individual agent, as well as a copula function that solely and fully captures the dependence structure among agents. Extensive experiments on synthetic and real-world datasets show that our model outperforms state-of-the-art baselines across various scenarios in the action prediction task, and is able to generate new trajectories close to expert demonstrations.

【23】 Lifelong Teacher-Student Network Learning 标题:终身师生网络学习

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:18 pages, 18 figures. in IEEE Transactions on Pattern Analysis and Machine Intelligence 链接:https://arxiv.org/abs/2107.04689 摘要:人类独特的认知能力在于从一系列经验中获得新知识和新技能的能力。同时,人工智能系统擅长只学习最后一个给定的任务,而不能记住过去学习的数据库。我们提出了一个新的终身学习方法,采用师生网络框架。当学生模块使用一个新的给定数据库进行训练时,教师模块会提醒学生过去所学的信息。教师由一个生成性对抗网络(GAN)实现,被训练来保存和回放与先前学习数据库的概率表示相对应的过去知识。同时,学生模块由一个变分自动编码器(VAE)实现,VAE从教师模块的输出和新的可用数据库中推断其潜在变量的表示。此外,学生模块被训练来捕获跨不同领域的连续和离散的基础数据表示。将所提出的终身学习框架应用于有监督、半监督和无监督训练中。代码可用~:\url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning} 摘要:A unique cognitive capability of humans consists in their ability to acquire new knowledge and skills from a sequence of experiences. Meanwhile, artificial intelligence systems are good at learning only the last given task without being able to remember the databases learnt in the past. We propose a novel lifelong learning methodology by employing a Teacher-Student network framework. While the Student module is trained with a new given database, the Teacher module would remind the Student about the information learnt in the past. The Teacher, implemented by a Generative Adversarial Network (GAN), is trained to preserve and replay past knowledge corresponding to the probabilistic representations of previously learn databases. Meanwhile, the Student module is implemented by a Variational Autoencoder (VAE) which infers its latent variable representation from both the output of the Teacher module as well as from the newly available database. Moreover, the Student module is trained to capture both continuous and discrete underlying data representations across different domains. The proposed lifelong learning framework is applied in supervised, semi-supervised and unsupervised training. The code is available~: \url{https://github.com/dtuzi123/Lifelong-Teacher-Student-Network-Learning}

【24】 Training Over-parameterized Models with Non-decomposable Objectives 标题:训练具有不可分解目标的过参数化模型

作者:Harikrishna Narasimhan,Aditya Krishna Menon 机构:Google Research, Mountain View, Google Research, New York 链接:https://arxiv.org/abs/2107.04641 摘要:许多现代机器学习应用程序都有着复杂而微妙的设计目标,比如最小化最坏情况下的错误,满足给定的精度或召回目标,或者强制执行组公平性约束。用于优化此类不可分解目标的流行技术将问题简化为一系列对成本敏感的学习任务,然后通过使用示例特定成本重新加权训练损失来解决每个任务。我们指出,标准的方法,重新加权的损失,以纳入标签成本可能会产生不满意的结果时,用于训练参数化模型。作为补救措施,我们提出了新的成本敏感损失,扩展了经典的logit调整思想,以处理更一般的成本矩阵。我们的损失是经过校准的,并且可以通过从教师模型中提取标签来进一步改善。通过在基准图像数据集上的实验,我们展示了该方法在训练具有共同鲁棒性和约束优化目标的ResNet模型方面的有效性。 摘要:Many modern machine learning applications come with complex and nuanced design goals such as minimizing the worst-case error, satisfying a given precision or recall target, or enforcing group-fairness constraints. Popular techniques for optimizing such non-decomposable objectives reduce the problem into a sequence of cost-sensitive learning tasks, each of which is then solved by re-weighting the training loss with example-specific costs. We point out that the standard approach of re-weighting the loss to incorporate label costs can produce unsatisfactory results when used to train over-parameterized models. As a remedy, we propose new cost-sensitive losses that extend the classical idea of logit adjustment to handle more general cost matrices. Our losses are calibrated, and can be further improved with distilled labels from a teacher model. Through experiments on benchmark image datasets, we showcase the effectiveness of our approach in training ResNet models with common robust and constrained optimization objectives.

【25】 Learning Probabilistic Reward Machines from Non-Markovian Stochastic Reward Processes 标题:从非马尔科夫随机报酬过程中学习概率报酬机器

作者:Alvaro Velasquez,Andre Beckus,Taylor Dohmen,Ashutosh Trivedi,Noah Topper,George Atia 机构: but can be 1Air Force Research Laboratory 2University of Colorado, Boul-der 3University of Central Florida 链接:https://arxiv.org/abs/2107.04633 摘要:在典型环境下,强化学习的成功部分取决于对奖励信号的马尔可夫假设,而奖励信号是agent学习最优策略的基础。近年来,奖励机的使用放宽了这一假设,使非马尔可夫奖励的结构化表示成为可能。特别地,这种表示可以用来扩充底层决策过程的状态空间,从而促进非马尔可夫强化学习。然而,这些奖励机器无法捕捉随机奖励信号的语义。在本文中,我们通过引入概率奖励机(PRMs)作为非马尔可夫随机奖励的表示,在这方面取得了进展。我们提出了一个算法来学习PRM从底层的决策过程,以及学习的PRM表示一个给定的决策策略。 摘要:The success of reinforcement learning in typical settings is, in part, predicated on underlying Markovian assumptions on the reward signal by which an agent learns optimal policies. In recent years, the use of reward machines has relaxed this assumption by enabling a structured representation of non-Markovian rewards. In particular, such representations can be used to augment the state space of the underlying decision process, thereby facilitating non-Markovian reinforcement learning. However, these reward machines cannot capture the semantics of stochastic reward signals. In this paper, we make progress on this front by introducing probabilistic reward machines (PRMs) as a representation of non-Markovian stochastic rewards. We present an algorithm to learn PRMs from the underlying decision process as well as to learn the PRM representation of a given decision-making policy.

【26】 SITHCon: A neural network robust to variations in input scaling on the time dimension 标题:SITHCon:一种对时间维上输入尺度变化具有鲁棒性的神经网络

作者:Brandon G. Jacques,Zoran Tiganj,Aakash Sarkar,Marc W. Howard,Per B. Sederberg 机构:Department of Psychology, University Of Virginia, Department of Computer Science, Indiana University, Department of Psychological and Brain Sciences, Boston University, University of Virginia 链接:https://arxiv.org/abs/2107.04616 摘要:在机器学习中,卷积神经网络(CNNs)在计算机视觉和模式识别方面都有着极其重要的影响。在计算机视觉中,部分灵活性来自于对卷积使用最大池运算来获得平移不变性。在哺乳动物的大脑中,时间的神经表征使用一组时间基函数。关键的是,这些基函数似乎以几何级数的形式排列,使得基集在对数时间上均匀分布。介绍了一种基于对数分布时间存储器的时程卷积网络(SITHCon)。对数分布时间存储器上的最大池导致时间上的尺度不变性。我们将SITHCon的性能与时间卷积网络(TCN)进行了比较,结果表明,虽然两个网络都可以学习单变量和多变量时间序列$f(t)$的分类和回归问题,但只有SITHCon具有这样的特性:它不需要对输入$f(at)$的重定标版本进行再训练就可以推广。这一特性受到神经科学和心理学研究结果的启发,可能导致大规模网络具有显著不同的能力,包括更快的训练和更大的通用性,甚至具有更少的自由参数。 摘要:In machine learning, convolutional neural networks (CNNs) have been extremely influential in both computer vision and in recognizing patterns extended over time. In computer vision, part of the flexibility arises from the use of max-pooling operations over the convolutions to attain translation invariance. In the mammalian brain, neural representations of time use a set of temporal basis functions. Critically, these basis functions appear to be arranged in a geometric series such that the basis set is evenly distributed over logarithmic time. This paper introduces a Scale-Invariant Temporal History Convolution network (SITHCon) that uses a logarithmically-distributed temporal memory. A max-pool over a logarithmically-distributed temporal memory results in scale-invariance in time. We compare performance of SITHCon to a Temporal Convolution Network (TCN) and demonstrate that, although both networks can learn classification and regression problems on both univariate and multivariate time series $f(t)$, only SITHCon has the property that it generalizes without retraining to rescaled versions of the input $f(at)$. This property, inspired by findings from neuroscience and psychology, could lead to large-scale networks with dramatically different capabilities, including faster training and greater generalizability, even with significantly fewer free parameters.

【27】 Tropical cyclone intensity estimations over the Indian ocean using Machine Learning 标题:基于机器学习的印度洋热带气旋强度估计

作者:Koushik Biswas,Sandeep Kumar,Ashish Kumar Pandey 机构:Department of Computer Science, IIIT Delhi, New Delhi, India,., &, Shaheed Bhagat Singh College, University of Delhi, Department of Mathematics, IIIT Delhi 备注:10 pages 链接:https://arxiv.org/abs/2107.05573 摘要:热带气旋是地球上最强大、破坏力最大的自然现象之一。热带风暴和暴雨会引起洪水,导致人员伤亡和经济损失。伴随飓风而来的毁灭性大风不仅严重影响沿海地区,甚至遥远地区。我们的研究集中在北印度洋热带气旋的强度估计,特别是气旋等级和最大持续地面风速(MSWS)。我们使用各种机器学习算法来估计气旋等级和城市固体废弃物。我们使用了原始盆地、日期、时间、纬度、经度、估计的中心压力和压降作为模型的属性。我们使用分类结果变量、旋风等级的多类分类模型,以及MSW的回归模型,因为它是一个连续变量。利用北印度洋28年来的最佳跟踪数据,我们估计了88%的精度和MSWS的均方根误差(RMSE)为2.3。对于更高级别的类别(5-7),准确率平均提高到98.84%。我们用最近在北印度洋的两个热带气旋瓦尤和法尼来测试我们的模式。对于等级,我们分别获得了93.22%和95.23%的准确率,而对于MSWS,我们分别获得了2.2和3.4的RMSE和0.99和0.99的R^2$。 摘要:Tropical cyclones are one of the most powerful and destructive natural phenomena on earth. Tropical storms and heavy rains can cause floods, which lead to human lives and economic loss. Devastating winds accompanying cyclones heavily affect not only the coastal regions, even distant areas. Our study focuses on the intensity estimation, particularly cyclone grade and maximum sustained surface wind speed (MSWS) of a tropical cyclone over the North Indian Ocean. We use various machine learning algorithms to estimate cyclone grade and MSWS. We have used the basin of origin, date, time, latitude, longitude, estimated central pressure, and pressure drop as attributes of our models. We use multi-class classification models for the categorical outcome variable, cyclone grade, and regression models for MSWS as it is a continuous variable. Using the best track data of 28 years over the North Indian Ocean, we estimate grade with an accuracy of 88% and MSWS with a root mean square error (RMSE) of 2.3. For higher grade categories (5-7), accuracy improves to an average of 98.84%. We tested our model with two recent tropical cyclones in the North Indian Ocean, Vayu and Fani. For grade, we obtained an accuracy of 93.22% and 95.23% respectively, while for MSWS, we obtained RMSE of 2.2 and 3.4 and $R^2$ of 0.99 and 0.99, respectively.

其他(24篇)

【1】 Hierarchical Neural Dynamic Policies 标题:分层神经动态策略

作者:Shikhar Bahl,Abhinav Gupta,Deepak Pathak 机构:Carnegie Mellon University 备注:Accepted at RSS 2021. Videos and code at this https URL 链接:https://arxiv.org/abs/2107.05627 摘要:我们在学习高维图像输入的同时,解决了对现实世界中动态任务的不可见配置的泛化问题。基于非线性动力学系统的方法已经成功地演示了机器人的动态行为,但是很难推广到不可见的结构以及从图像输入中学习。最近的工作通过使用深度网络策略和重参数化动作来嵌入动态系统的结构来解决这个问题,但是仍然在图像目标的不同配置域中挣扎,因此很难推广。在本文中,我们通过将动态系统的结构嵌入到一个称为层次神经动态策略(H-NDPs)的层次深度策略学习框架中来解决这种二分性。H-NDPs不是直接将深层动力系统与不同的数据相匹配,而是在状态空间中学习基于局部动力系统的策略,然后将其提取为仅从高维图像操作的基于全局动力系统的策略。H-NDP还提供了平滑的轨迹,在现实世界中具有强大的安全优势。我们在现实世界(数字书写、舀水和倒水)和模拟(抓、扔、拣)中对动态任务进行了广泛的实验。我们发现,H-NDPs可以很容易地与模仿和强化学习相结合,并获得最先进的结果。视频结果位于https://shikharbahl.github.io/hierarchical-ndps/ 摘要:We tackle the problem of generalization to unseen configurations for dynamic tasks in the real world while learning from high-dimensional image input. The family of nonlinear dynamical system-based methods have successfully demonstrated dynamic robot behaviors but have difficulty in generalizing to unseen configurations as well as learning from image inputs. Recent works approach this issue by using deep network policies and reparameterize actions to embed the structure of dynamical systems but still struggle in domains with diverse configurations of image goals, and hence, find it difficult to generalize. In this paper, we address this dichotomy by leveraging embedding the structure of dynamical systems in a hierarchical deep policy learning framework, called Hierarchical Neural Dynamical Policies (H-NDPs). Instead of fitting deep dynamical systems to diverse data directly, H-NDPs form a curriculum by learning local dynamical system-based policies on small regions in state-space and then distill them into a global dynamical system-based policy that operates only from high-dimensional images. H-NDPs additionally provide smooth trajectories, a strong safety benefit in the real world. We perform extensive experiments on dynamic tasks both in the real world (digit writing, scooping, and pouring) and simulation (catching, throwing, picking). We show that H-NDPs are easily integrated with both imitation as well as reinforcement learning setups and achieve state-of-the-art results. Video results are at https://shikharbahl.github.io/hierarchical-ndps/

【2】 Direct speech-to-speech translation with discrete units 标题:使用离散单元的直接语音到语音翻译

作者:Ann Lee,Peng-Jen Chen,Changhan Wang,Jiatao Gu,Xutai Ma,Adam Polyak,Yossi Adi,Qing He,Yun Tang,Juan Pino,Wei-Ning Hsu 机构:Facebook AI, Johns Hopkins University 链接:https://arxiv.org/abs/2107.05604 摘要:我们提出了一种直接的语-语转换(S2ST)模型,该模型不依赖于中间文本生成,将一种语言的语音转换为另一种语言的语音。以前的工作通过训练一个基于注意的序列到序列模型来解决这个问题,该模型将源语音谱图映射到目标语音谱图。为了解决对目标语音的连续谱图特征建模的挑战,我们提出了从未标记的语音语料库中预测自监督离散表示的方法。在有目标文本文本的情况下,我们设计了一个语音和文本联合训练的多任务学习框架,使该模型能够在同一推理过程中同时产生双模式输出(语音和文本)。在Fisher西班牙语-英语数据集上的实验表明,预测离散单元以及联合语音和文本训练比预测谱图的基线提高了11 BLEU的模型性能,并弥补了83%的性能差距。在没有任何文本文本的情况下进行训练,我们的模型可以获得与预测光谱图和使用文本数据进行训练的基线相似的性能。 摘要:We present a direct speech-to-speech translation (S2ST) model that translates speech from one language to speech in another language without relying on intermediate text generation. Previous work addresses the problem by training an attention-based sequence-to-sequence model that maps source speech spectrograms into target spectrograms. To tackle the challenge of modeling continuous spectrogram features of the target speech, we propose to predict the self-supervised discrete representations learned from an unlabeled speech corpus instead. When target text transcripts are available, we design a multitask learning framework with joint speech and text training that enables the model to generate dual mode output (speech and text) simultaneously in the same inference pass. Experiments on the Fisher Spanish-English dataset show that predicting discrete units and joint speech and text training improve model performance by 11 BLEU compared with a baseline that predicts spectrograms and bridges 83% of the performance gap towards a cascaded system. When trained without any text transcripts, our model achieves similar performance as a baseline that predicts spectrograms and is trained with text data.

【3】 Technical Report of Team GraphMIRAcles in the WikiKG90M-LSC Track of OGB-LSC @ KDD Cup 2021 标题:OGB-LSC@KDD Cup 2021年WikiKG90M-LSC赛道GraphMIRAcle团队技术报告

作者:Jianyu Cai,Jiajun Chen,Taoxing Pan,Zhanqiu Zhang,Jie Wang 机构:University of Science and Technology of China 链接:https://arxiv.org/abs/2107.05476 摘要:近年来,大规模知识图中的链接预测越来越受到人们的关注。OGB-LSC团队提出了OGB大规模挑战(OGB-LSC),这是三个真实世界数据集的集合,用于推进大规模图形机器学习的最新技术。本文介绍了我们团队在OGB-LSC@KDD Cup 2021的WikiKG90M LSC轨迹中的图形轨迹的解决方案。在WikiKG90M LSC轨迹中,我们的目标是自动预测WikiKG90M(从Wikidata中提取的大规模知识图)中丢失的链接。为了应对这一挑战,我们提出了一个集成了三个组件的框架——基本模型复杂CMRC、规则挖掘程序AMIE 3和预测缺失链接的推理模型。实验表明,该方案在测试数据集上的MRR达到了0.9707。此外,由于推理模型中的知识提取使用了实际中不可用的测试尾候选,因此我们对知识提取进行了烧蚀研究。实验表明,该模型在完全验证数据集上的MRR为0.9533。 摘要:Link prediction in large-scale knowledge graphs has gained increasing attention recently. The OGB-LSC team presented OGB Large-Scale Challenge (OGB-LSC), a collection of three real-world datasets for advancing the state-of-the-art in large-scale graph machine learning. In this paper, we introduce the solution of our team GraphMIRAcles in the WikiKG90M-LSC track of OGB-LSC @ KDD Cup 2021. In the WikiKG90M-LSC track, the goal is to automatically predict missing links in WikiKG90M, a large scale knowledge graph extracted from Wikidata. To address this challenge, we propose a framework that integrates three components -- a basic model ComplEx-CMRC, a rule miner AMIE 3, and an inference model to predict missing links. Experiments demonstrate that our solution achieves an MRR of 0.9707 on the test dataset. Moreover, as the knowledge distillation in the inference model uses test tail candidates -- which are unavailable in practice -- we conduct ablation studies on knowledge distillation. Experiments demonstrate that our model without knowledge distillation achieves an MRR of 0.9533 on the full validation dataset.

【4】 Parameter Selection: Why We Should Pay More Attention to It 标题:参数选择:为什么要引起我们的重视?

作者:Jie-Jyun Liu,Tsung-Han Yang,Si-An Chen,Chih-Jen Lin 机构:ASUS Intelligent Cloud Services, National Taiwan University 备注:Accepted by ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2107.05393 摘要:参数选择在有监督学习中的重要性是众所周知的。然而,由于参数组合较多,往往采用不完整或不充分的程序。这种情况可能会导致误导或混淆的结论。在本文中,通过一个有趣的例子,我们指出,严重性超出了一般公认的范围。在医学代码预测的多标签分类主题中,一篇有影响的论文对一个集合进行了适当的参数选择,但是当移到频繁出现的标签的子集时,作者使用了相同的参数而没有单独的调整。在随后的研究中,这组频繁出现的标签成为了一个流行的基准,它不断推动着技术的发展。然而,我们发现,如果当时进行了参数整定,这些研究中的大多数结果都不能超过原论文中的方法。因此,不清楚随后的事态发展实际带来了多大进展。这一教训清楚地表明,如果对参数选择没有足够的重视,本领域的研究进展可能是不确定的,甚至是虚幻的。 摘要:The importance of parameter selection in supervised learning is well known. However, due to the many parameter combinations, an incomplete or an insufficient procedure is often applied. This situation may cause misleading or confusing conclusions. In this opinion paper, through an intriguing example we point out that the seriousness goes beyond what is generally recognized. In the topic of multi-label classification for medical code prediction, one influential paper conducted a proper parameter selection on a set, but when moving to a subset of frequently occurring labels, the authors used the same parameters without a separate tuning. The set of frequent labels became a popular benchmark in subsequent studies, which kept pushing the state of the art. However, we discovered that most of the results in these studies cannot surpass the approach in the original paper if a parameter tuning had been conducted at the time. Thus it is unclear how much progress the subsequent developments have actually brought. The lesson clearly indicates that without enough attention on parameter selection, the research progress in our field can be uncertain or even illusive.

【5】 Identifying Hijacked Reviews 标题:识别被劫持的评论

作者:Monika Daryani,James Caverlee 机构:Texas A&M University, College Station, TX 备注:To be published in ACL-IJCNLP 2021 Workshop on e-Commerce and NLP (ECNLP) 链接:https://arxiv.org/abs/2107.05385 摘要:虚假评论和评论操纵是全球在线市场上日益严重的问题。评论劫持是一种新的评论操纵策略,其中不道德的卖家“劫持”现有的产品页面(通常是一个有许多正面评论的页面),然后用完全不同的产品更新产品细节,如标题、照片和描述。由于之前的评论仍然附加,新的项目似乎审查良好。然而,目前还没有关于评论劫持的公开数据集,关于这种策略的文献也知之甚少。因此,本文提出了一个由三部分组成的研究:(i)提出了一个通过交换产品和评论来生成评论劫持综合标记数据的框架(ii)然后,我们评估了一个双LSTM网络和BERT序列对分类器在利用这些数据区分合法评论和劫持评论方面的潜力;(iii)然后,我们在原始数据中的31K个产品集合(有6.5 M个评论)上部署性能最佳的模型,其中我们发现了100个以前未知的评论劫持示例。 摘要:Fake reviews and review manipulation are growing problems on online marketplaces globally. Review Hijacking is a new review manipulation tactic in which unethical sellers "hijack" an existing product page (usually one with many positive reviews), then update the product details like title, photo, and description with those of an entirely different product. With the earlier reviews still attached, the new item appears well-reviewed. However, there are no public datasets of review hijacking and little is known in the literature about this tactic. Hence, this paper proposes a three-part study: (i) we propose a framework to generate synthetically labeled data for review hijacking by swapping products and reviews; (ii) then, we evaluate the potential of both a Twin LSTM network and BERT sequence pair classifier to distinguish legitimate reviews from hijacked ones using this data; and (iii) we then deploy the best performing model on a collection of 31K products (with 6.5 M reviews) in the original data, where we find 100s of previously unknown examples of review hijacking.

【6】 Structured Directional Pruning via Perturbation Orthogonal Projection 标题:基于扰动正交投影的结构化定向剪枝

作者:YinchuanLi,XiaofengLiu,YunfengShao,QingWang,YanhuiGeng 机构:Yunfeng Shao, Huawei Noah’s Ark Lab, Qing Wang†, Tianjin University, Yanhui Geng 链接:https://arxiv.org/abs/2107.05328 摘要:结构化剪枝是减少神经网络计算量的一种有效的压缩技术,它通常通过增加扰动来减少网络参数,而代价是略微增加训练损失。一种更合理的方法是沿着优化器找到的平坦最小谷找到一个稀疏的极小值,即随机梯度下降法,它使训练损失保持不变。为了实现这一目标,我们提出了基于正交投影的结构化方向剪枝方法。我们还提出了一种快速的解算器sDprun,并进一步证明了它在充分训练后渐近地实现了方向剪枝。在CIFAR-10和CIFAR-100数据集上使用VGG-Net和ResNet进行的实验表明,该方法在不进行再训练的情况下获得了最先进的修剪精度(VGG16和CIFAR-10任务的修剪精度为93.97%)。使用DNN、VGG-Net和WRN28X10在MNIST、CIFAR-10和CIFAR-100数据集上进行的实验表明,该方法能够进行结构化的定向剪枝,达到与优化器相同的最小谷值。 摘要:Structured pruning is an effective compression technique to reduce the computation of neural networks, which is usually achieved by adding perturbations to reduce network parameters at the cost of slightly increasing training loss. A more reasonable approach is to find a sparse minimizer along the flat minimum valley found by optimizers, i.e. stochastic gradient descent, which keeps the training loss constant. To achieve this goal, we propose the structured directional pruning based on orthogonal projecting the perturbations onto the flat minimum valley. We also propose a fast solver sDprun and further prove that it achieves directional pruning asymptotically after sufficient training. Experiments using VGG-Net and ResNet on CIFAR-10 and CIFAR-100 datasets show that our method obtains the state-of-the-art pruned accuracy (i.e. 93.97% on VGG16, CIFAR-10 task) without retraining. Experiments using DNN, VGG-Net and WRN28X10 on MNIST, CIFAR-10 and CIFAR-100 datasets demonstrate our method performs structured directional pruning, reaching the same minimum valley as the optimizer.

【7】 DaCy: A Unified Framework for Danish NLP 标题:Dacy:丹麦NLP的统一框架

作者:Kenneth Enevoldsen,Lasse Hansen,Kristoffer Nielbo 机构:Department of Clinical Medicine &, Center for Humanities Computing Aarhus, Aarhus University, Jens Chr. Skous Vej , Building ,rd floor, Denmark, Aarhus C, Kristoffer L. Nielbo, Interacting Minds Centre & 备注:8 pages, 5 tables 链接:https://arxiv.org/abs/2107.05295 摘要:近年来,随着多个新的数据集和模型的加入,丹麦自然语言处理(NLP)取得了长足的进步。然而,目前还没有一个连贯的框架来为丹麦应用最先进的模型。我们提出DaCy:一个统一的框架为丹麦NLP建立在空间。DaCy使用高效的多任务模型,在命名实体识别、词性标注和依赖关系解析方面获得最先进的性能。DaCy包含用于轻松集成现有模型的工具,例如极性、情感或主观性检测。此外,我们通过扩充DaNE的测试集,对丹麦NLP管道的偏差和稳健性进行了一系列测试。DaCy-large比较有利,并且对长输入长度、拼写变化和错误特别健壮。除DaCy-large外,所有模型都显示出与种族相关的显著偏倚,而只有Polyglot显示出显著的性别偏倚。我们认为,对于具有有限基准集的语言,数据扩充对于获得更真实和细粒度的性能估计尤其有用。我们提供了一系列增广器,作为对中低资源语言的语言模型进行更全面评估的第一步,并鼓励进一步开发。 摘要:Danish natural language processing (NLP) has in recent years obtained considerable improvements with the addition of multiple new datasets and models. However, at present, there is no coherent framework for applying state-of-the-art models for Danish. We present DaCy: a unified framework for Danish NLP built on SpaCy. DaCy uses efficient multitask models which obtain state-of-the-art performance on named entity recognition, part-of-speech tagging, and dependency parsing. DaCy contains tools for easy integration of existing models such as for polarity, emotion, or subjectivity detection. In addition, we conduct a series of tests for biases and robustness of Danish NLP pipelines through augmentation of the test set of DaNE. DaCy large compares favorably and is especially robust to long input lengths and spelling variations and errors. All models except DaCy large display significant biases related to ethnicity while only Polyglot shows a significant gender bias. We argue that for languages with limited benchmark sets, data augmentation can be particularly useful for obtaining more realistic and fine-grained performance estimates. We provide a series of augmenters as a first step towards a more thorough evaluation of language models for low and medium resource languages and encourage further development.

【8】 Continuous Time Bandits With Sampling Costs 标题:具有抽样费用的连续时间带

作者:Rahul Vaze,Manjesh K. Hanawal 机构:School of Technology and Computer Science, Tata Institute of Fundamental Research, Mumbai, Maharastra , India, Industrial Engineering and Operations Research, Indian Institute of Technology, Bombay, Mumbai, Maharashtra , India 链接:https://arxiv.org/abs/2107.05289 摘要:我们考虑一个连续时间的多臂强盗问题(CTMAB),其中学习者可以在给定的间隔中采样任意次数的臂,并从每个样本获得随机回报,然而,增加采样频率会导致附加惩罚/成本。因此,作为采样频率的函数,在获得大量报酬和产生采样成本之间存在权衡。其目标是设计一个最小化后悔的学习算法,即oracle策略的收益与学习算法的收益之差。CTMAB与通常的多臂bandit问题(MAB)有本质的不同,例如,在CTMAB中,即使是单臂情况也是不平凡的,因为最佳采样频率取决于需要估计的臂的平均值。我们首先建立任何算法可达到的遗憾的下界,然后提出达到对数因子下界的算法。对于单臂情形,我们证明了遗憾的下界是$\Omega((\logt)^2/\mu)$,其中$\mu$是臂的平均值,$T$是时间范围。对于多臂情形,我们证明了遗憾的下界是$\Omega((\logt)^2\mu/\Delta^2)$,其中$\mu$现在表示最佳臂的平均值,$\Delta$是最佳臂和次最佳臂的平均值之差。然后,我们提出一个算法,实现绑定到常数项。 摘要:We consider a continuous-time multi-arm bandit problem (CTMAB), where the learner can sample arms any number of times in a given interval and obtain a random reward from each sample, however, increasing the frequency of sampling incurs an additive penalty/cost. Thus, there is a tradeoff between obtaining large reward and incurring sampling cost as a function of the sampling frequency. The goal is to design a learning algorithm that minimizes regret, that is defined as the difference of the payoff of the oracle policy and that of the learning algorithm. CTMAB is fundamentally different than the usual multi-arm bandit problem (MAB), e.g., even the single-arm case is non-trivial in CTMAB, since the optimal sampling frequency depends on the mean of the arm, which needs to be estimated. We first establish lower bounds on the regret achievable with any algorithm and then propose algorithms that achieve the lower bound up to logarithmic factors. For the single-arm case, we show that the lower bound on the regret is $\Omega((\log T)^2/\mu)$, where $\mu$ is the mean of the arm, and $T$ is the time horizon. For the multiple arms case, we show that the lower bound on the regret is $\Omega((\log T)^2 \mu/\Delta^2)$, where $\mu$ now represents the mean of the best arm, and $\Delta$ is the difference of the mean of the best and the second-best arm. We then propose an algorithm that achieves the bound up to constant terms.

【9】 Cautious Actor-Critic 标题:谨慎的演员-评论家

作者:Lingwei Zhu,Toshinori Kitamura,Takamitsu Matsubara 机构:Nara Institute of Science and Technology, Japan 备注:23 pages 链接:https://arxiv.org/abs/2107.05217 摘要:非策略学习的振荡性能和行为-批评(AC)设置中的持续错误要求算法能够保守地学习以更好地适应稳定性关键应用。本文提出了一种新的非策略AC算法CAC。“谨慎”这个名字来源于双重保守的性质,即我们利用保守策略迭代中的经典策略插值作为参与者,利用保守值迭代的熵正则化作为批评家。我们的主要观察结果是熵正则化的批评家促进和简化了笨拙的内插actor更新,同时确保了稳健的策略改进。在一系列具有挑战性的连续控制问题上,我们将CAC与最新的AC方法进行了比较,并证明了CAC在显著稳定学习的同时,具有相当的性能。 摘要:The oscillating performance of off-policy learning and persisting errors in the actor-critic (AC) setting call for algorithms that can conservatively learn to suit the stability-critical applications better. In this paper, we propose a novel off-policy AC algorithm cautious actor-critic (CAC). The name cautious comes from the doubly conservative nature that we exploit the classic policy interpolation from conservative policy iteration for the actor and the entropy-regularization of conservative value iteration for the critic. Our key observation is the entropy-regularized critic facilitates and simplifies the unwieldy interpolated actor update while still ensuring robust policy improvement. We compare CAC to state-of-the-art AC methods on a set of challenging continuous control problems and demonstrate that CAC achieves comparable performance while significantly stabilizes learning.

【10】 MOOCRep: A Unified Pre-trained Embedding of MOOC Entities 标题:MOOCRep:MOOC实体的统一预训练嵌入

作者:Shalini Pandey,Jaideep Srivastava 链接:https://arxiv.org/abs/2107.05154 摘要:许多机器学习模型都是为了解决大规模在线开放课程(MOOC)平台上的信息过载问题而建立的。这些模型依赖于学习MOOC实体的强大表示。然而,他们面临着缺乏专家标签数据的问题。为了克服这个问题,我们提出利用MOOC结构中丰富的未标记数据来学习MOOC实体的预训练表示,这些数据可以直接应用于下游任务。虽然现有的预训练方法在自然语言处理领域已经取得了成功,因为它们学习到了强大的文本表示,但是它们的模型并没有利用MOOC实体的丰富信息。这些丰富的信息包括讲座、概念和课程之间的图形关系,以及关于概念复杂性的领域知识。我们开发了MOOCRep,这是一种基于Transformer语言模型的新方法,训练有两个预训练目标:1)基于图的目标来捕捉存在于图中的实体和关系的强大信号;2)面向领域的目标来有效地整合概念的复杂度。我们的实验表明,MOOCRep的嵌入在概念前提预测和讲座推荐这两项对教育界非常重要的任务上优于最先进的表征学习方法。 摘要:Many machine learning models have been built to tackle information overload issues on Massive Open Online Courses (MOOC) platforms. These models rely on learning powerful representations of MOOC entities. However, they suffer from the problem of scarce expert label data. To overcome this problem, we propose to learn pre-trained representations of MOOC entities using abundant unlabeled data from the structure of MOOCs which can directly be applied to the downstream tasks. While existing pre-training methods have been successful in NLP areas as they learn powerful textual representation, their models do not leverage the richer information about MOOC entities. This richer information includes the graph relationship between the lectures, concepts, and courses along with the domain knowledge about the complexity of a concept. We develop MOOCRep, a novel method based on Transformer language model trained with two pre-training objectives : 1) graph-based objective to capture the powerful signal of entities and relations that exist in the graph, and 2) domain-oriented objective to effectively incorporate the complexity level of concepts. Our experiments reveal that MOOCRep's embeddings outperform state-of-the-art representation learning methods on two tasks important for education community, concept pre-requisite prediction and lecture recommendation.

【11】 LexSubCon: Integrating Knowledge from Lexical Resources into Contextual Embeddings for Lexical Substitution 标题:LexSubCon:将词汇资源中的知识整合到上下文嵌入中进行词汇替换

作者:George Michalopoulos,Ian McKillop,Alexander Wong,Helen Chen 机构:University of Waterloo, Waterloo, Canada 备注:11 pages, 1 figure 链接:https://arxiv.org/abs/2107.05132 摘要:词汇替代是指在给定的语篇语境中为一个词生成有意义的替代词。语境词嵌入模型依靠从句子中被替换词中提取的语境信息,在词汇替换任务中取得了最新的成果。然而,这样的模型并没有考虑存在于外部词汇数据库中的结构化知识。我们介绍了lexsubco,一个基于上下文嵌入模型的端到端词汇替换框架,它可以识别高度准确的替换候选词。这是通过将上下文信息与结构化词汇资源中的知识相结合来实现的。我们的方法包括:(i)通过线性插值目标输入嵌入对及其可能同义词的平均嵌入,在目标词的输入嵌入中引入一种新的混合嵌入策略(ii)考虑目标词及其拟用词的句子定义嵌入的相似性;(iii)通过一个微调的句子相似性模型来计算每一个替换对句子语义的影响。我们的实验表明,lexsubco在LS07和CoInCo基准数据集上的性能优于以前的最新方法,这些数据集广泛用于词汇替换任务。 摘要:Lexical substitution is the task of generating meaningful substitutes for a word in a given textual context. Contextual word embedding models have achieved state-of-the-art results in the lexical substitution task by relying on contextual information extracted from the replaced word within the sentence. However, such models do not take into account structured knowledge that exists in external lexical databases. We introduce LexSubCon, an end-to-end lexical substitution framework based on contextual embedding models that can identify highly accurate substitute candidates. This is achieved by combining contextual information with knowledge from structured lexical resources. Our approach involves: (i) introducing a novel mix-up embedding strategy in the creation of the input embedding of the target word through linearly interpolating the pair of the target input embedding and the average embedding of its probable synonyms; (ii) considering the similarity of the sentence-definition embeddings of the target word and its proposed candidates; and, (iii) calculating the effect of each substitution in the semantics of the sentence through a fine-tuned sentence similarity model. Our experiments show that LexSubCon outperforms previous state-of-the-art methods on LS07 and CoInCo benchmark datasets that are widely used for lexical substitution tasks.

【12】 SGD: The Role of Implicit Regularization, Batch-size and Multiple-epochs 标题:SGD:隐式正则化、批量和多期的作用

作者:Satyen Kale,Ayush Sekhari,Karthik Sridharan 机构:Google Research, NY, Cornell University 链接:https://arxiv.org/abs/2107.05074 摘要:多历元、小批量、随机梯度下降(SGD)是大参数模型学习的首选方法。一个流行的理论解释了为什么SGD在实践中工作得很好,是因为该算法有一个隐式正则化,使其输出偏向于一个好的解决方案。也许理论上最容易理解的SGD学习设置是随机凸优化(SCO),其中众所周知,SGD以$O(1/\sqrt{n})$的速率学习,其中$n$是样本数。在本文中,我们考虑SCO的问题,并探讨隐式正则化,批量大小和多个时代的作用SGD。我们的主要贡献有三个方面:(a)我们证明了对于任何正则化子,都存在一个正则化经验风险最小化无法学习的SCO问题。这自动排除了任何隐含的正则化为基础的解释,为成功的SGD(b) 在样本复杂度方面,我们通过经验损失梯度下降(GD)将SGD和学习分离开来。我们证明了存在一个SCO问题,使得具有任何步长和迭代次数的GD只能以次优的速率学习:至少$\widetilde{\Omega}(1/n^{5/12})$(c) 我们提出了一种在实践中常用的SGD的多历元变体。在最坏的情况下,我们证明了该算法至少和单通SGD算法一样好。然而,对于某些SCO问题,对数据集进行多次传递可以显著优于单次传递SGD。我们将我们的结果推广到一般的学习环境,给出了一个对于任何数据分布都是可学习的问题,并且对于这个问题,对于任何正则化函数,SGD严格地优于RERM。最后,我们讨论了我们的结果对深度学习的影响,并显示了两层对角神经网络的SGD和ERM之间的分离。 摘要:Multi-epoch, small-batch, Stochastic Gradient Descent (SGD) has been the method of choice for learning with large over-parameterized models. A popular theory for explaining why SGD works well in practice is that the algorithm has an implicit regularization that biases its output towards a good solution. Perhaps the theoretically most well understood learning setting for SGD is that of Stochastic Convex Optimization (SCO), where it is well known that SGD learns at a rate of $O(1/\sqrt{n})$, where $n$ is the number of samples. In this paper, we consider the problem of SCO and explore the role of implicit regularization, batch size and multiple epochs for SGD. Our main contributions are threefold: (a) We show that for any regularizer, there is an SCO problem for which Regularized Empirical Risk Minimzation fails to learn. This automatically rules out any implicit regularization based explanation for the success of SGD. (b) We provide a separation between SGD and learning via Gradient Descent on empirical loss (GD) in terms of sample complexity. We show that there is an SCO problem such that GD with any step size and number of iterations can only learn at a suboptimal rate: at least $\widetilde{\Omega}(1/n^{5/12})$. (c) We present a multi-epoch variant of SGD commonly used in practice. We prove that this algorithm is at least as good as single pass SGD in the worst case. However, for certain SCO problems, taking multiple passes over the dataset can significantly outperform single pass SGD. We extend our results to the general learning setting by showing a problem which is learnable for any data distribution, and for this problem, SGD is strictly better than RERM for any regularization function. We conclude by discussing the implications of our results for deep learning, and show a separation between SGD and ERM for two layer diagonal neural networks.

【13】 Neural Waveshaping Synthesis 标题:神经波形合成

作者:Ben Hayes,Charalampos Saitis,György Fazekas 机构:Centre for Digital Music, Queen Mary University of London 备注:Accepted to ISMIR 2021; See online supplement at this https URL 链接:https://arxiv.org/abs/2107.05050 摘要:我们提出了神经波形形成单元(NEWT):一种新颖的、轻量级的、完全因果的神经音频合成方法,它直接在波形域中工作,并伴随着优化(FastNEWT)以实现高效的CPU推理。NEWT使用具有周期性激活的时间分布多层感知器来隐式学习编码目标音色特征的非线性传递函数。一旦训练,蝾螈可以通过简单的输入和输出信号仿射变换产生复杂的音色演变。我们将NEWT与一个可微噪声合成器和混响器配对,发现它在F0和响度特性的条件下,仅用260k的总模型参数就能产生逼真的乐器性能。我们将我们的方法与最先进的多刺激听力测试和Fr′echet音频距离基准进行了比较,发现它在测试的音色域中具有竞争力。我们的方法在生成速度方面显著优于基准测试,并且在使用和不使用FastNEWT的情况下,在消费CPU上实现了实时性能,这表明它是未来创造性声音设计工具的可行基础。 摘要:We present the Neural Waveshaping Unit (NEWT): a novel, lightweight, fully causal approach to neural audio synthesis which operates directly in the waveform domain, with an accompanying optimisation (FastNEWT) for efficient CPU inference. The NEWT uses time-distributed multilayer perceptrons with periodic activations to implicitly learn nonlinear transfer functions that encode the characteristics of a target timbre. Once trained, a NEWT can produce complex timbral evolutions by simple affine transformations of its input and output signals. We paired the NEWT with a differentiable noise synthesiser and reverb and found it capable of generating realistic musical instrument performances with only 260k total model parameters, conditioned on F0 and loudness features. We compared our method to state-of-the-art benchmarks with a multi-stimulus listening test and the Fr\'echet Audio Distance and found it performed competitively across the tested timbral domains. Our method significantly outperformed the benchmarks in terms of generation speed, and achieved real-time performance on a consumer CPU, both with and without FastNEWT, suggesting it is a viable basis for future creative sound design tools.

【14】 Coordinate-wise Control Variates for Deep Policy Gradients 标题:深层政策梯度的坐标控制变量

作者:Yuanyi Zhong,Yuan Zhou,Jian Peng 机构: Theydo not require the knowledge of the environment dynamics 1Department of Computer Science, UIUC 2Department of In-dustrial and Enterprise Systems Engineering 备注:14 pages, 3 figures 链接:https://arxiv.org/abs/2107.04987 摘要:控制变量(CV)方法被广泛应用于政策梯度估计中,以减少实际中梯度估计量的方差。通过从状态动作值估计中减去基线函数来应用控制变量。方差减小的策略梯度可能导致更高的学习效率。近年来,基于深度神经网络策略的控制变量研究主要集中在标量值基线函数上。向量值基线的影响尚不清楚。研究了基于向量值基线构造的坐标控制变量和分层控制变量在神经网络策略中的方差缩减问题。我们提出的实验证据表明,与传统的标量值基线相比,这种基线可以获得更低的方差。我们演示了如何用这些新的控制变量装备流行的近端策略优化(PPO)算法。结果表明,在连续控制基准中,适当正则化后的算法比标量控制变量具有更高的采样效率。 摘要:The control variates (CV) method is widely used in policy gradient estimation to reduce the variance of the gradient estimators in practice. A control variate is applied by subtracting a baseline function from the state-action value estimates. Then the variance-reduced policy gradient presumably leads to higher learning efficiency. Recent research on control variates with deep neural net policies mainly focuses on scalar-valued baseline functions. The effect of vector-valued baselines is under-explored. This paper investigates variance reduction with coordinate-wise and layer-wise control variates constructed from vector-valued baselines for neural net policies. We present experimental evidence suggesting that lower variance can be obtained with such baselines than with the conventional scalar-valued baseline. We demonstrate how to equip the popular Proximal Policy Optimization (PPO) algorithm with these new control variates. We show that the resulting algorithm with proper regularization can achieve higher sample efficiency than scalar control variates in continuous control benchmarks.

【15】 Non-linear Visual Knowledge Discovery with Elliptic Paired Coordinates 标题:基于椭圆配对坐标的非线性视觉知识发现

作者:Rose McDonald,Boris Kovalerchuk 机构:Dept. of Computer Science, Central Washington University, USA 备注:29 pages, 29 figures, 12 tables 链接:https://arxiv.org/abs/2107.04974 摘要:对于人类来说,用肉眼在超过2-3维的数据中发现视觉知识是一个挑战。本章探讨使用新的椭圆成对坐标(EPC)可视化交互发现预测机器学习模型的效率。实验结果表明,EPC能够实现多维数据的可视化,并支持在二维空间保存多维信息的可视化机器学习。相对于平行坐标和径向坐标,EPC可视化只需要每个n-D点的一半视觉元素。本文开发了一个交互式软件系统EllipseVis,它处理高维数据集,创建EPC可视化,并通过发现EPC中的优势规则生成预测分类模型。通过使用交互式和自动过程,它发现了单一类别的高优势区EPC。EPC方法在计算实验中成功地发现了覆盖率高、精度高的非线性预测模型。这可以通过产生视觉上吸引人的优势规则而使多个域受益。本章介绍了在实验中使用真实数据和模拟数据成功测试EPC非线性方法的结果,将EPC推广到动态椭圆成对坐标(DEPC),合并坐标权重以优化视觉发现,介绍了一种可供选择的EPC设计方案,并介绍了基于EPC/DEPC的松散机器学习方法的概念。 摘要:It is challenging for humans to enable visual knowledge discovery in data with more than 2-3 dimensions with a naked eye. This chapter explores the efficiency of discovering predictive machine learning models interactively using new Elliptic Paired coordinates (EPC) visualizations. It is shown that EPC are capable to visualize multidimensional data and support visual machine learning with preservation of multidimensional information in 2-D. Relative to parallel and radial coordinates, EPC visualization requires only a half of the visual elements for each n-D point. An interactive software system EllipseVis, which is developed in this work, processes high-dimensional datasets, creates EPC visualizations, and produces predictive classification models by discovering dominance rules in EPC. By using interactive and automatic processes it discovers zones in EPC with a high dominance of a single class. The EPC methodology has been successful in discovering non-linear predictive models with high coverage and precision in the computational experiments. This can benefit multiple domains by producing visually appealing dominance rules. This chapter presents results of successful testing the EPC non-linear methodology in experiments using real and simulated data, EPC generalized to the Dynamic Elliptic Paired Coordinates (DEPC), incorporation of the weights of coordinates to optimize the visual discovery, introduction of an alternative EPC design and introduction of the concept of incompact machine learning methodology based on EPC/DEPC.

【16】 LS3: Latent Space Safe Sets for Long-Horizon Visuomotor Control of Iterative Tasks 标题:LS3:迭代任务长视距视觉运动控制的潜在空间安全集

作者:Albert Wilcox,Ashwin Balakrishna,Brijen Thananjeyan,Joseph E. Gonzalez,Ken Goldberg 机构: equal contribution 备注:Preprint, Under Review. First two authors contributed equally 链接:https://arxiv.org/abs/2107.04775 摘要:强化学习(RL)算法在探索高维环境以学习复杂的、长时间范围的任务方面取得了令人印象深刻的成功,但在探索不受约束的情况下,它往往表现出不安全的行为,需要广泛的环境交互。动态不确定环境下安全学习的一个很有前途的策略是要求agent能够鲁棒地返回到任务成功(从而保证安全)的状态。虽然这种方法在低维环境中取得了成功,但在具有高维状态空间(如图像)的环境中实施这种约束是一个挑战。我们提出了潜在空间安全集(LS3),通过使用次优演示和学习动力学模型,将该策略扩展到迭代的、长视距的图像观测任务,将探索限制在学习安全集的邻域内,其中任务可能完成。我们评估了4个领域的LS3,包括一个具有挑战性的模拟顺序推送任务和一个物理电缆路由任务。我们发现,LS3在满足约束条件的同时,可以利用先前的任务成功来限制探索和学习,比先前的算法更有效。看到了吗https://tinyurl.com/latent-ss 代码和补充材料。 摘要:Reinforcement learning (RL) algorithms have shown impressive success in exploring high-dimensional environments to learn complex, long-horizon tasks, but can often exhibit unsafe behaviors and require extensive environment interaction when exploration is unconstrained. A promising strategy for safe learning in dynamically uncertain environments is requiring that the agent can robustly return to states where task success (and therefore safety) can be guaranteed. While this approach has been successful in low-dimensions, enforcing this constraint in environments with high-dimensional state spaces, such as images, is challenging. We present Latent Space Safe Sets (LS3), which extends this strategy to iterative, long-horizon tasks with image observations by using suboptimal demonstrations and a learned dynamics model to restrict exploration to the neighborhood of a learned Safe Set where task completion is likely. We evaluate LS3 on 4 domains, including a challenging sequential pushing task in simulation and a physical cable routing task. We find that LS3 can use prior task successes to restrict exploration and learn more efficiently than prior algorithms while satisfying constraints. See https://tinyurl.com/latent-ss for code and supplementary material.

【17】 Lifelong Mixture of Variational Autoencoders 标题:变分自动编码器的终身混合

作者:Fei Ye,Adrian G. Bors 机构:Department of Computer Science, University of York, York YO,GH, UK 备注:Accepted by IEEE Transactions on Neural Networks and Learning Systems 链接:https://arxiv.org/abs/2107.04694 摘要:在本文中,我们提出了一个端到端的终身学习混合专家。每个专家由一个变分自动编码器(VAE)实现。混合系统中的专家通过在给定训练样本的对数似然上最大化单个分量证据下限(MELBO)的混合进行联合训练。混合物中的混合系数控制着每个专家在目标表示中的贡献。这些样本来自狄里克莱分布,其参数在终身学习期间通过非参数估计确定。当新任务与先前学习的任务相似时,该模型可以快速学习新任务。提出的VAE终身混合(L-MVAE)在学习一个全新的任务时,用新的组件扩展了它的体系结构。经过训练后,我们的模型可以自动确定输入新数据样本时要使用的相关专家。由于推理过程中只使用一个专家,这种机制既提高了存储效率,又减少了计算量。L-MVAE推理模型能够在与不同任务相关的数据域之间的联合潜在空间中进行插值,并且被证明对于解纠缠学习表示是有效的。 摘要:In this paper, we propose an end-to-end lifelong learning mixture of experts. Each expert is implemented by a Variational Autoencoder (VAE). The experts in the mixture system are jointly trained by maximizing a mixture of individual component evidence lower bounds (MELBO) on the log-likelihood of the given training samples. The mixing coefficients in the mixture, control the contributions of each expert in the goal representation. These are sampled from a Dirichlet distribution whose parameters are determined through non-parametric estimation during lifelong learning. The model can learn new tasks fast when these are similar to those previously learnt. The proposed Lifelong mixture of VAE (L-MVAE) expands its architecture with new components when learning a completely new task. After the training, our model can automatically determine the relevant expert to be used when fed with new data samples. This mechanism benefits both the memory efficiency and the required computational cost as only one expert is used during the inference. The L-MVAE inference model is able to perform interpolation in the joint latent space across the data domains associated with different tasks and is shown to be efficient for disentangled learning representation.

【18】 The Effects of Invertibility on the Representational Complexity of Encoders in Variational Autoencoders 标题:可逆性对变分自动编码器编码器表征复杂度的影响

作者:Divyansh Pareek,Andrej Risteski 机构:Machine Learning Department, Carnegie Mellon University 备注:34 pages 链接:https://arxiv.org/abs/2107.04652 摘要:训练和使用基于现代神经网络的潜变量生成模型(如变分自动编码器)通常需要同时训练一个生成方向和一个推理(编码)方向,该方向近似于潜变量的后验分布。因此,问题出现了:为了能够准确地模拟给定生成模型的后验分布,推理模型需要有多复杂?在本文中,我们确定了一个重要性质的生成地图影响所需的编码器的大小。我们证明,如果生成映射是“强可逆的”(在某种意义上,我们适当地形式化),那么推理模型不需要复杂得多。相反,我们证明了存在不可逆生成映射,对于不可逆生成映射,编码方向需要指数地大(在计算复杂性的标准假设下)。重要的是,我们不要求生成模型是分层可逆的,这是许多相关文献假设的,并且在实践中使用的许多架构(例如卷积和基于池的网络)都不满足。因此,我们为经验智慧提供了理论支持,即当数据位于低维流形上时,学习深层生成模型更困难。 摘要:Training and using modern neural-network based latent-variable generative models (like Variational Autoencoders) often require simultaneously training a generative direction along with an inferential(encoding) direction, which approximates the posterior distribution over the latent variables. Thus, the question arises: how complex does the inferential model need to be, in order to be able to accurately model the posterior distribution of a given generative model? In this paper, we identify an important property of the generative map impacting the required size of the encoder. We show that if the generative map is "strongly invertible" (in a sense we suitably formalize), the inferential model need not be much more complex. Conversely, we prove that there exist non-invertible generative maps, for which the encoding direction needs to be exponentially larger (under standard assumptions in computational complexity). Importantly, we do not require the generative model to be layerwise invertible, which a lot of the related literature assumes and isn't satisfied by many architectures used in practice (e.g. convolution and pooling based networks). Thus, we provide theoretical support for the empirical wisdom that learning deep generative models is harder when data lies on a low-dimensional manifold.

【19】 Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization 标题:线上的精确度:论分布外与分布内综合的强相关性

作者:John Miller,Rohan Taori,Aditi Raghunathan,Shiori Sagawa,Pang Wei Koh,Vaishaal Shankar,Percy Liang,Yair Carmon,Ludwig Schmidt 机构:edu‡Tel Aviv University 链接:https://arxiv.org/abs/2107.04649 摘要:为了使机器学习系统可靠,我们必须了解它们在看不见的分布环境中的性能。在本文中,我们实证表明,在广泛的模型和分布转移中,分布外绩效与分布内绩效密切相关。具体来说,我们展示了CIFAR-10和ImageNet变体的分布内和分布外性能之间的强相关性,这是一项从YCB对象派生的合成姿态估计任务,FMoW荒野中的卫星图像分类和iWildCam荒野中的野生动物分类。模型结构、超参数、训练集大小和训练持续时间之间存在很强的相关性,并且比现有领域适应理论所期望的更精确。为了完成这幅图,我们还调查了相关性较弱的情况,例如CIFAR-10-C和camelon17野生动物组织分类数据集的一些合成分布转移。最后,我们提供了一个基于高斯数据模型的候选理论,说明了分布偏移引起的数据协方差变化如何影响观测到的相关性。 摘要:For machine learning systems to be reliable, we must understand their performance in unseen, out-of-distribution environments. In this paper, we empirically show that out-of-distribution performance is strongly correlated with in-distribution performance for a wide range of models and distribution shifts. Specifically, we demonstrate strong correlations between in-distribution and out-of-distribution performance on variants of CIFAR-10 & ImageNet, a synthetic pose estimation task derived from YCB objects, satellite imagery classification in FMoW-WILDS, and wildlife classification in iWildCam-WILDS. The strong correlations hold across model architectures, hyperparameters, training set size, and training duration, and are more precise than what is expected from existing domain adaptation theory. To complete the picture, we also investigate cases where the correlation is weaker, for instance some synthetic distribution shifts from CIFAR-10-C and the tissue classification dataset Camelyon17-WILDS. Finally, we provide a candidate theory based on a Gaussian data model that shows how changes in the data covariance arising from distribution shift can affect the observed correlations.

【20】 Impossibility of What? Formal and Substantive Equality in Algorithmic Fairness 标题:什么不可能?算法公平中的形式平等与实质平等

作者:Ben Green 机构:Michigan Society of Fellows, Gerald R. Ford School of Public Policy 链接:https://arxiv.org/abs/2107.04642 摘要:面对社会和经济不平等的复杂危机,许多人已转向算法决策,以实现更大的社会公平。随着这些努力的加强,“算法公平”这一新兴领域中的推理越来越影响公平在实践中的体现。本文探讨了算法公平是否为增进社会公平提供了合适的概念和实用工具。我认为,占主导地位的“形式化”算法公平性方法作为追求平等的框架装备不足,因为其狭窄的分析框架产生了限制性的改革方法。鉴于这些缺点,我提出了另一种选择:一种以反对社会等级制度为中心的算法公平的“实质性”方法,并对如何解决不平等问题提供了更广泛的分析。这种实质性的方法使我们能够对算法在打击压迫方面的作用进行更有成效的理论化。每种方法对“不可能的公平”(算法公平的数学定义之间的不相容)的反应都说明了形式上的和实质上的算法公平之间的区别。虽然形式方法要求我们接受“公平的不可能性”作为加强平等努力的严格限制,但实质性方法允许我们通过建议不受这种错误困境影响的改革,以及更好地改善社会压迫条件的改革,来逃避“公平的不可能性”。 摘要:In the face of compounding crises of social and economic inequality, many have turned to algorithmic decision-making to achieve greater fairness in society. As these efforts intensify, reasoning within the burgeoning field of "algorithmic fairness" increasingly shapes how fairness manifests in practice. This paper interrogates whether algorithmic fairness provides the appropriate conceptual and practical tools for enhancing social equality. I argue that the dominant, "formal" approach to algorithmic fairness is ill-equipped as a framework for pursuing equality, as its narrow frame of analysis generates restrictive approaches to reform. In light of these shortcomings, I propose an alternative: a "substantive" approach to algorithmic fairness that centers opposition to social hierarchies and provides a more expansive analysis of how to address inequality. This substantive approach enables more fruitful theorizing about the role of algorithms in combatting oppression. The distinction between formal and substantive algorithmic fairness is exemplified by each approach's responses to the "impossibility of fairness" (an incompatibility between mathematical definitions of algorithmic fairness). While the formal approach requires us to accept the "impossibility of fairness" as a harsh limit on efforts to enhance equality, the substantive approach allows us to escape the "impossibility of fairness" by suggesting reforms that are not subject to this false dilemma and that are better equipped to ameliorate conditions of social oppression.

【21】 Computer-Aided Diagnosis of Low Grade Endometrial Stromal Sarcoma (LGESS) 标题:低度恶性子宫内膜间质肉瘤的计算机辅助诊断(LGESS)

作者:Xinxin Yang,Mark Stamp 链接:https://arxiv.org/abs/2107.05426 摘要:低度子宫内膜间质肉瘤(LGESS)是一种罕见的肿瘤,约占子宫癌病例的0.2%。大约75%的LGESS患者最初被误诊为平滑肌瘤,平滑肌瘤是一种良性肿瘤,也称为肌瘤。在这项研究中,子宫组织活检图像的潜在LGESS患者是预处理使用分割和染色归一化算法。各种经典的机器学习和领先的深度学习模型,然后应用于分类组织图像,无论是良性或恶性。对于所考虑的经典技术,我们获得的最高分类精度约为0.85,而我们的最佳深度学习模型获得的精度约为0.87。这些结果表明,经过适当训练的学习算法可以在LGESS的诊断中发挥有用的作用。 摘要:Low grade endometrial stromal sarcoma (LGESS) is rare form of cancer, accounting for about 0.2% of all uterine cancer cases. Approximately 75% of LGESS patients are initially misdiagnosed with leiomyoma, which is a type of benign tumor, also known as fibroids. In this research, uterine tissue biopsy images of potential LGESS patients are preprocessed using segmentation and staining normalization algorithms. A variety of classic machine learning and leading deep learning models are then applied to classify tissue images as either benign or cancerous. For the classic techniques considered, the highest classification accuracy we attain is about 0.85, while our best deep learning model achieves an accuracy of approximately 0.87. These results indicate that properly trained learning algorithms can play a useful role in the diagnosis of LGESS.

【22】 EndoUDA: A modality independent segmentation approach for endoscopy imaging 标题:EndoUDA:一种独立于模态的内窥镜成像分割方法

作者:Numan Celik,Sharib Ali,Soumya Gupta,Barbara Braden,Jens Rittscher 机构: Department of Engineering Science, Institute of Biomedical Engineering, University, Big Data Institute, University of Oxford, Li Ka Shing Centre for Health, Information and Discovery, Oxford, UK, NIHR Oxford Biomedical Research Centre, Oxford, UK 备注:10 pages, 3 figures, 3 tables. Accepted for MICCAI 2021 链接:https://arxiv.org/abs/2107.05342 摘要:胃肠道(GI)癌症前体需要经常监测患者的危险分层。自动分割方法有助于更准确地评估危险区域,并有助于治疗程序甚至切除。在临床实践中,除了传统的白光成像(WLI)外,还使用了窄带成像(NBI)和荧光成像等辅助成像方式。然而,目前大多数分割方法都是有监督的,并且只集中在一个单一的模态数据集上,这项工作利用了一种与目标无关的无监督域自适应(UDA)技术,该技术能够推广到一个看不见的目标模态。在这种背景下,我们提出了一种新的基于UDA的分割方法,该方法将变分自动编码器和U-Net与一个通用的EfficientNet-B4主干相结合,并使用联合损耗对目标样本进行潜在空间优化。我们证明,当只使用WLI(source)模态进行训练时,我们的模型可以推广到不可见的目标NBI(target)模态。我们在上消化道和下消化道内窥镜数据上的实验表明,与原始监督方法和最新的UDA分割方法相比,我们的方法是有效的。 摘要:Gastrointestinal (GI) cancer precursors require frequent monitoring for risk stratification of patients. Automated segmentation methods can help to assess risk areas more accurately, and assist in therapeutic procedures or even removal. In clinical practice, addition to the conventional white-light imaging (WLI), complimentary modalities such as narrow-band imaging (NBI) and fluorescence imaging are used. While, today most segmentation approaches are supervised and only concentrated on a single modality dataset, this work exploits to use a target-independent unsupervised domain adaptation (UDA) technique that is capable to generalize to an unseen target modality. In this context, we propose a novel UDA-based segmentation method that couples the variational autoencoder and U-Net with a common EfficientNet-B4 backbone, and uses a joint loss for latent-space optimization for target samples. We show that our model can generalize to unseen target NBI (target) modality when trained using only WLI (source) modality. Our experiments on both upper and lower GI endoscopy data show the effectiveness of our approach compared to naive supervised approach and state-of-the-art UDA segmentation methods.

【23】 Metalearning Linear Bandits by Prior Update 标题:基于先验更新的元学习线性Bitts

作者:Amit Peleg,Naama Pearl,Ron Meir 机构:Technion, Israel, University of Haifa, Israel 链接:https://arxiv.org/abs/2107.05320 摘要:序贯决策的完全贝叶斯方法假设问题参数是由已知的先验信息生成的,而在实际应用中,这种信息往往是缺乏的,需要通过学习来估计。这一问题在具有部分信息的决策设置中更加严重,使用错误的先验可能导致较差的探索和较差的性能。在这项工作中,我们证明,在随机线性强盗和高斯先验的情况下,只要先验估计足够接近真实先验,使用错误先验的算法的性能接近使用真实先验的算法的性能。接下来,我们讨论通过metalearning学习先验知识的任务,即学习者在多个任务实例中更新先验知识的估计,以提高未来任务的性能。然后在每个任务中根据传入的观察更新估计的先验值,同时选择行动以最大化预期回报。在这项工作中,我们应用这个方案在一个线性土匪设置,并提供了算法和遗憾界,证明其有效性,相比,算法知道正确的先验知识。我们的结果适用于一类广泛的算法,包括,例如,汤普森采样和信息定向采样。 摘要:Fully Bayesian approaches to sequential decision-making assume that problem parameters are generated from a known prior, while in practice, such information is often lacking, and needs to be estimated through learning. This problem is exacerbated in decision-making setups with partial information, where using a misspecified prior may lead to poor exploration and inferior performance. In this work we prove, in the context of stochastic linear bandits and Gaussian priors, that as long as the prior estimate is sufficiently close to the true prior, the performance of an algorithm that uses the misspecified prior is close to that of the algorithm that uses the true prior. Next, we address the task of learning the prior through metalearning, where a learner updates its estimate of the prior across multiple task instances in order to improve performance on future tasks. The estimated prior is then updated within each task based on incoming observations, while actions are selected in order to maximize expected reward. In this work we apply this scheme within a linear bandit setting, and provide algorithms and regret bounds, demonstrating its effectiveness, as compared to an algorithm that knows the correct prior. Our results hold for a broad class of algorithms, including, for example, Thompson Sampling and Information Directed Sampling.

【24】 Improving Efficiency and Accuracy of Causal Discovery Using a Hierarchical Wrapper 标题:使用分层包装器提高因果发现的效率和准确性

作者:Shami Nisimov,Yaniv Gurwicz,Raanan Y. Rohekar,Gal Novik 机构:Intel Labs 备注:The 37th Conference on Uncertainty in Artificial Intelligence (UAI 2021), Workshop on Tractable Probabilistic Modeling 链接:https://arxiv.org/abs/2107.05001 摘要:从观测数据中发现因果关系是许多科学分支的重要工具。在某些假设下,它允许科学家解释现象、预测和做出决定。在大样本限制下,已有完善的因果发现算法被引入其中,搜索表示因果关系的有向无环图(DAG)或其等价类。然而,在现实世界中,只有有限的训练数据可用,这限制了这些算法使用的统计测试的能力,导致推断的因果模型中的错误。这通常是通过设计一个策略来解决的,该策略使用尽可能少的统计测试。在本文中,我们以递归包装的形式为现有的基于约束的因果发现算法引入了这样一种策略,它保持了算法的可靠性和完整性。它从一开始就使用规范化的最小割准则递归地对观测变量进行聚类,并在回溯过程中使用基线因果发现算法来学习局部子图。然后将它们结合起来,确保完整性。通过烧蚀研究、使用合成数据和常见的实际基准,我们证明了我们的方法需要更少的统计测试,学习更精确的图形,并且需要比基线算法更短的运行时间。 摘要:Causal discovery from observational data is an important tool in many branches of science. Under certain assumptions it allows scientists to explain phenomena, predict, and make decisions. In the large sample limit, sound and complete causal discovery algorithms have been previously introduced, where a directed acyclic graph (DAG), or its equivalence class, representing causal relations is searched. However, in real-world cases, only finite training data is available, which limits the power of statistical tests used by these algorithms, leading to errors in the inferred causal model. This is commonly addressed by devising a strategy for using as few as possible statistical tests. In this paper, we introduce such a strategy in the form of a recursive wrapper for existing constraint-based causal discovery algorithms, which preserves soundness and completeness. It recursively clusters the observed variables using the normalized min-cut criterion from the outset, and uses a baseline causal discovery algorithm during backtracking for learning local sub-graphs. It then combines them and ensures completeness. By an ablation study, using synthetic data, and by common real-world benchmarks, we demonstrate that our approach requires significantly fewer statistical tests, learns more accurate graphs, and requires shorter run-times than the baseline algorithm.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-07-13,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档