前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >机器学习学术速递[11.9]

机器学习学术速递[11.9]

作者头像
公众号-arXiv每日学术速递
发布2021-11-17 10:54:40
1.5K0
发布2021-11-17 10:54:40
举报
文章被收录于专栏:arXiv每日学术速递

cs.LG 方向,今日共计164篇

Graph相关(图学习|图神经网络|图优化等)(5篇)

【1】 Directional Message Passing on Molecular Graphs via Synthetic Coordinates 标题:基于合成坐标的分子图定向信息传递 链接:https://arxiv.org/abs/2111.04718

作者:Johannes Klicpera,Chandan Yeshwanth,Stephan Günnemann 机构:Technical University of Munich, Germany 备注:Published as a conference paper at NeurIPS 2021 摘要:通过定向信息传递利用坐标的图形神经网络最近在多个分子性质预测任务上开创了最新技术。然而,它们依赖于原子位置信息,而这些信息通常是不可用的,获取这些信息通常代价高昂,甚至是不可能的。在本文中,我们提出了合成坐标,使先进的GNN的使用,而不需要真正的分子结构。我们提出两种距离作为合成坐标:指定分子构型大致范围的距离边界,以及使用个性化PageRank对称变量的基于图形的距离。为了利用距离和角度信息,我们提出了一种将正规图神经网络转换为方向MPNN的方法。我们表明,通过这种转换,我们可以将标准图神经网络的误差降低55%。通过在SMP和DimeNet++模型中加入合成坐标,我们进一步设置了最先进的无锌和无坐标QM9。我们的实现是在线的。 摘要:Graph neural networks that leverage coordinates via directional message passing have recently set the state of the art on multiple molecular property prediction tasks. However, they rely on atom position information that is often unavailable, and obtaining it is usually prohibitively expensive or even impossible. In this paper we propose synthetic coordinates that enable the use of advanced GNNs without requiring the true molecular configuration. We propose two distances as synthetic coordinates: Distance bounds that specify the rough range of molecular configurations, and graph-based distances using a symmetric variant of personalized PageRank. To leverage both distance and angular information we propose a method of transforming normal graph neural networks into directional MPNNs. We show that with this transformation we can reduce the error of a normal graph neural network by 55% on the ZINC benchmark. We furthermore set the state of the art on ZINC and coordinate-free QM9 by incorporating synthetic coordinates in the SMP and DimeNet++ models. Our implementation is available online.

【2】 threaTrace: Detecting and Tracing Host-based Threats in Node Level Through Provenance Graph Learning 标题:ThraTrace:基于起源图学习的节点级主机威胁检测与跟踪 链接:https://arxiv.org/abs/2111.04333

作者:Su Wang,Zhiliang Wang,Tao Zhou,Xia Yin,Dongqi Han,Han Zhang,Hongbin Sun,Xingang Shi,Jiahai Yang 备注:13 pages, 6 figures 摘要:基于主机的威胁,如程序攻击、恶意软件植入和高级持久性威胁(APT),通常被现代攻击者采用。最近的研究建议利用数据来源中丰富的上下文信息来检测主机中的威胁。数据源是由系统审计数据构造的有向无环图。起源图中的节点表示系统实体(例如,$processs$和$files$),边表示信息流方向上的系统调用。然而,以前的研究提取了整个源图的特征,对少量与威胁相关的实体不敏感,从而导致在搜索隐蔽威胁时性能低下。我们介绍了threaTrace,一种基于异常的检测器,它在系统实体级别检测基于主机的威胁,而无需事先了解攻击模式。我们定制GraphSAGE,一种归纳图神经网络,来学习每个良性实体在起源图中的角色。threaTrace是一个实时系统,它可以对长期运行的主机进行监控,并且能够在早期检测基于主机的入侵。我们在三个公共数据集上评估threaTrace。结果表明,threaTrace优于三种最先进的主机入侵检测系统。 摘要:Host-based threats such as Program Attack, Malware Implantation, and Advanced Persistent Threats (APT), are commonly adopted by modern attackers. Recent studies propose leveraging the rich contextual information in data provenance to detect threats in a host. Data provenance is a directed acyclic graph constructed from system audit data. Nodes in a provenance graph represent system entities (e.g., $processes$ and $files$) and edges represent system calls in the direction of information flow. However, previous studies, which extract features of the whole provenance graph, are not sensitive to the small number of threat-related entities and thus result in low performance when hunting stealthy threats. We present threaTrace, an anomaly-based detector that detects host-based threats at system entity level without prior knowledge of attack patterns. We tailor GraphSAGE, an inductive graph neural network, to learn every benign entity's role in a provenance graph. threaTrace is a real-time system, which is scalable of monitoring a long-term running host and capable of detecting host-based intrusion in their early phase. We evaluate threaTrace on three public datasets. The results show that threaTrace outperforms three state-of-the-art host intrusion detection systems.

【3】 Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation 标题:用于无监督医疗报告生成的知识图自动编码 链接:https://arxiv.org/abs/2111.04318

作者:Fenglin Liu,Chenyu You,Xian Wu,Shen Ge,Sheng Wang,Xu Sun 机构:MOE Key Laboratory of Computational Linguistics, School of EECS, Peking University, School of ECE, Peking University, Paul G. Allen School of Computer Science and Engineering, University of Washington, Department of Electrical Engineering, Yale University 摘要:医学报告生成是指对给定的医学图像自动生成一个长而连贯的报告,它受到了越来越多的研究兴趣。现有的方法主要采用有监督的方式,并且严重依赖于耦合的图像-报告对。然而,在医学领域,建立一个大规模的图像报告配对数据集既耗时又昂贵。为了放松对成对数据的依赖,我们提出了一种无监督的模型知识图自动编码器(KGAE),它在训练中接受独立的图像和报告集。KGAE由预先构造的知识图、知识驱动编码器和知识驱动解码器组成。知识图作为一个共享的潜在空间,架起了视觉域和文本域之间的桥梁;知识驱动编码器将医学图像和报告投影到该潜在空间中的相应坐标,知识驱动解码器根据该空间中的坐标生成医学报告。由于知识驱动的编码器和解码器可以使用独立的图像和报告集进行训练,因此KGAE是无监督的。实验表明,无监督的KGAE在不使用任何图像报告训练对的情况下生成理想的医疗报告。此外,KGAE还可以在半监督和监督环境下工作,并在训练中接受成对的图像和报告。通过进一步微调图像报告对,KGAE在两个数据集上始终优于当前最先进的模型。 摘要:Medical report generation, which aims to automatically generate a long and coherent report of a given medical image, has been receiving growing research interests. Existing approaches mainly adopt a supervised manner and heavily rely on coupled image-report pairs. However, in the medical domain, building a large-scale image-report paired dataset is both time-consuming and expensive. To relax the dependency on paired data, we propose an unsupervised model Knowledge Graph Auto-Encoder (KGAE) which accepts independent sets of images and reports in training. KGAE consists of a pre-constructed knowledge graph, a knowledge-driven encoder and a knowledge-driven decoder. The knowledge graph works as the shared latent space to bridge the visual and textual domains; The knowledge-driven encoder projects medical images and reports to the corresponding coordinates in this latent space and the knowledge-driven decoder generates a medical report given a coordinate in this space. Since the knowledge-driven encoder and decoder can be trained with independent sets of images and reports, KGAE is unsupervised. The experiments show that the unsupervised KGAE generates desirable medical reports without using any image-report training pairs. Moreover, KGAE can also work in both semi-supervised and supervised settings, and accept paired images and reports in training. By further fine-tuning with image-report pairs, KGAE consistently outperforms the current state-of-the-art models on two datasets.

【4】 Graph Robustness Benchmark: Benchmarking the Adversarial Robustness of Graph Machine Learning 标题:图健壮性基准:对图机器学习的对抗性健壮性进行基准测试 链接:https://arxiv.org/abs/2111.04314

作者:Qinkai Zheng,Xu Zou,Yuxiao Dong,Yukuo Cen,Da Yin,Jiarong Xu,Yang Yang,Jie Tang 机构:† Department of Computer Science and Technology, Tsinghua University, ‡ Microsoft Research, Redmond ◦ Fudan University ⋄ Zhejiang University 备注:21 pages, 12 figures, NeurIPS 2021 Datasets and Benchmarks Track 摘要:对图的对抗性攻击对图机器学习(GML)模型的鲁棒性构成了重大威胁。自然,袭击者和捍卫者之间的军备竞赛不断升级。然而,在相同和现实的条件下,双方背后的战略往往没有得到公平的比较。为了弥补这一差距,我们提出了图形鲁棒性基准(GRB),旨在为GML模型的对抗性鲁棒性提供可扩展、统一、模块化和可复制的评估。GRB通过1)开发可扩展和多样化的数据集,2)模块化攻击和防御实施,以及3)在细化场景中统一评估协议,使攻击和防御过程标准化。通过利用GRB管道,最终用户可以专注于开发具有自动化数据处理和实验评估的稳健GML模型。为了支持图形对抗性学习的开放性和可复制性研究,GRB还主持了不同场景的公共排行榜。作为起点,我们对基准技术进行了广泛的实验。GRB是开源的,欢迎社区的贡献。数据集、代码、排行榜可在https://cogdl.ai/grb/home. 摘要:Adversarial attacks on graphs have posed a major threat to the robustness of graph machine learning (GML) models. Naturally, there is an ever-escalating arms race between attackers and defenders. However, the strategies behind both sides are often not fairly compared under the same and realistic conditions. To bridge this gap, we present the Graph Robustness Benchmark (GRB) with the goal of providing a scalable, unified, modular, and reproducible evaluation for the adversarial robustness of GML models. GRB standardizes the process of attacks and defenses by 1) developing scalable and diverse datasets, 2) modularizing the attack and defense implementations, and 3) unifying the evaluation protocol in refined scenarios. By leveraging the GRB pipeline, the end-users can focus on the development of robust GML models with automated data processing and experimental evaluations. To support open and reproducible research on graph adversarial learning, GRB also hosts public leaderboards across different scenarios. As a starting point, we conduct extensive experiments to benchmark baseline techniques. GRB is open-source and welcomes contributions from the community. Datasets, codes, leaderboards are available at https://cogdl.ai/grb/home.

【5】 Deep Unsupervised Active Learning on Learnable Graphs 标题:基于可学习图的深度无监督主动学习 链接:https://arxiv.org/abs/2111.04286

作者:Handong Ma,Changsheng Li,Xinchu Shi,Ye Yuan,Guoren Wang 机构: SCSE, University of Electronic Science and Technology of China, Chengdu, China, School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China, Meituan 摘要:近年来,深度学习已成功应用于无监督主动学习。然而,目前的方法试图通过自动编码器学习非线性变换,而忽略了样本关系,为无监督主动学习设计更有效的表示学习机制留下了巨大的空间。本文提出了一种基于可学习图的深度无监督主动学习模型ALLG。ALLG得益于学习最佳图结构,以获得更好的样本表示和选择代表性样本。为了使学习到的图结构更加稳定和有效,我们将$k$-最近邻图作为先验,学习一种关系传播图结构。我们还加入了不同层之间的快捷连接,这可以在一定程度上缓解众所周知的过度平滑问题。据我们所知,这是第一次尝试利用图形结构学习进行无监督主动学习。在六个数据集上进行的大量实验证明了我们方法的有效性。 摘要:Recently deep learning has been successfully applied to unsupervised active learning. However, current method attempts to learn a nonlinear transformation via an auto-encoder while ignoring the sample relation, leaving huge room to design more effective representation learning mechanisms for unsupervised active learning. In this paper, we propose a novel deep unsupervised Active Learning model via Learnable Graphs, named ALLG. ALLG benefits from learning optimal graph structures to acquire better sample representation and select representative samples. To make the learnt graph structure more stable and effective, we take into account $k$-nearest neighbor graph as a priori, and learn a relation propagation graph structure. We also incorporate shortcut connections among different layers, which can alleviate the well-known over-smoothing problem to some extent. To the best of our knowledge, this is the first attempt to leverage graph structure learning for unsupervised active learning. Extensive experiments performed on six datasets demonstrate the efficacy of our method.

Transformer(3篇)

【1】 AI-UPV at IberLEF-2021 DETOXIS task: Toxicity Detection in Immigration-Related Web News Comments Using Transformers and Statistical Models 标题:IberLEF-2021年DETOXIS的AI-UPV任务:使用Transformer和统计模型检测移民相关网络新闻评论中的毒性 链接:https://arxiv.org/abs/2111.04530

作者:Angel Felipe Magnossão de Paula,Ipek Baris Schlicht 机构:Toxicity Detection in Immigration-Related WebNews Comments Using Transformers andStatistical ModelsAngel Felipe Magnoss˜ao de Paula[0000−000 1−8 57 5− 50 1 2] and Ipek BarisSchlicht[0000−000 2− 50 37− 2 20 3]Universitat Politecnica de Valencia 备注:20 pages. Presented at IberLEF. See this http URL 摘要:本文描述了我们在第三届伊比利亚语言评估论坛研讨会上参与检测西班牙语评论中的毒性(解毒)共享任务2021。共享任务分为两个相关的分类任务:(i)任务1:毒性检测和检测;二任务2:毒性水平检测。他们关注与移民有关的不同在线新闻文章中有毒评论的传播加剧了仇外问题。缓解这一问题的必要措施之一是检测评论中的毒性。我们的主要目标是根据竞赛的官方指标:任务1的F1分数和任务2的贴近度评估指标(CEM),实施一个准确的模型,以检测2021年排毒共享任务中网络新闻文章评论中的仇外心理。为了解决这些问题,我们使用了两种类型的机器学习模型:(i)统计模型和(ii)用于语言理解的深层双向转换(BERT)模型。我们使用BETO(一种在大型西班牙语语料库上训练的BERT模型)在这两项任务中都取得了最好的结果。我们以F1得分0.5996获得任务1官方排名第3名,以CEM得分0.7142获得任务2官方排名第6名。我们的结果表明:(i)在文本评论中,BERT模型比统计模型获得更好的毒性检测结果;(ii)单语BERT模型在其预先训练的语言文本评论中的毒性检测方面优于多语言BERT模型。 摘要:This paper describes our participation in the DEtection of TOXicity in comments In Spanish (DETOXIS) shared task 2021 at the 3rd Workshop on Iberian Languages Evaluation Forum. The shared task is divided into two related classification tasks: (i) Task 1: toxicity detection and; (ii) Task 2: toxicity level detection. They focus on the xenophobic problem exacerbated by the spread of toxic comments posted in different online news articles related to immigration. One of the necessary efforts towards mitigating this problem is to detect toxicity in the comments. Our main objective was to implement an accurate model to detect xenophobia in comments about web news articles within the DETOXIS shared task 2021, based on the competition's official metrics: the F1-score for Task 1 and the Closeness Evaluation Metric (CEM) for Task 2. To solve the tasks, we worked with two types of machine learning models: (i) statistical models and (ii) Deep Bidirectional Transformers for Language Understanding (BERT) models. We obtained our best results in both tasks using BETO, an BERT model trained on a big Spanish corpus. We obtained the 3rd place in Task 1 official ranking with the F1-score of 0.5996, and we achieved the 6th place in Task 2 official ranking with the CEM of 0.7142. Our results suggest: (i) BERT models obtain better results than statistical models for toxicity detection in text comments; (ii) Monolingual BERT models have an advantage over multilingual BERT models in toxicity detection in text comments in their pre-trained language.

【2】 Multi-Airport Delay Prediction with Transformers 标题:基于Transformer的多机场延误预测 链接:https://arxiv.org/abs/2111.04494

作者:Liya Wang,Alex Tien,Jason Chou 机构:The MITRE Corporation, McLean, VA, United States 摘要:具有合理前瞻时间的机场性能预测是一项具有挑战性的任务,之前的各种研究已经尝试过。交通、需求、天气和交通管理措施都是任何预测模型的关键输入。本文提出了一种基于时间融合变换(TFT)的新方法来同时预测多个机场的起飞和到达延误。这种方法可以捕获预测时已知输入的复杂时间动态,然后预测未来四小时内选定的延迟度量。在处理天气输入时,开发了一个自监督学习(SSL)模型,将高维天气数据编码为低维表示,以使TFT的训练更加高效。初步结果表明,基于TFT的延迟预测模型在测试数据集上通过较小的预测误差获得了令人满意的性能。此外,模型输出的可解释性分析确定了延迟预测的重要输入因素。所提议的方法有望帮助空中交通管理者或决策者了解关于延误缓解的交通管理措施,一旦投入使用,将提供足够的准备时间来规划预测的性能下降。 摘要:Airport performance prediction with a reasonable look-ahead time is a challenging task and has been attempted by various prior research. Traffic, demand, weather, and traffic management actions are all critical inputs to any prediction model. In this paper, a novel approach based on Temporal Fusion Transformer (TFT) was proposed to predict departure and arrival delays simultaneously for multiple airports at once. This approach can capture complex temporal dynamics of the inputs known at the time of prediction and then forecast selected delay metrics up to four hours into the future. When dealing with weather inputs, a self-supervised learning (SSL) model was developed to encode high-dimensional weather data into a much lower-dimensional representation to make the training of TFT more efficiently and effectively. The initial results show that the TFT-based delay prediction model achieves satisfactory performance measured by smaller prediction errors on a testing dataset. In addition, the interpretability analysis of the model outputs identifies the important input factors for delay prediction. The proposed approach is expected to help air traffic managers or decision makers gain insights about traffic management actions on delay mitigation and once operationalized, provide enough lead time to plan for predicted performance degradation.

【3】 Flight Demand Forecasting with Transformers 标题:基于Transformer的航班需求预测 链接:https://arxiv.org/abs/2111.04471

作者:Liya Wang,Amy Mykityshyn,Craig Johnson,Jillian Cheng 机构:The MITRE Corporation, McLean, VA, United States, Federal Aviation Administration 备注:arXiv admin note: substantial text overlap with arXiv:2011.04476 摘要:Transformer已经成为自然语言处理(NLP)领域事实上的标准。他们在计算机视觉和其他领域也取得了进展。Transformer可以使人工智能(AI)模型动态关注其输入的某些部分,从而更有效地进行推理。受transformers成功的启发,我们采用该技术预测多个视野中的战略航班起飞需求。这项工作是为了支持MITRE开发的移动应用程序Pacer而进行的,Pacer向通用航空(GA)航班运营商显示预测的离港需求,以便他们能够更好地了解繁忙时段离港延误的可能性。涉及Pacer先前设计的基于规则的预测方法的现场演示表明,出发需求的预测精度仍有改进的余地。本研究致力于从两个关键方面提高预测精度:更好的数据源和稳健的预测算法。我们利用了两个数据源,航空系统性能指标(ASPM)和系统范围信息管理(SWIM),作为我们的输入。然后,我们使用时间融合变换器(TFT)对五个不同机场的预测模型进行训练。案例研究表明,TFT比传统的预测方法有更大的优势,它们可以在不同的机场进行更好的预测,并具有更好的可解释性。 摘要:Transformers have become the de-facto standard in the natural language processing (NLP) field. They have also gained momentum in computer vision and other domains. Transformers can enable artificial intelligence (AI) models to dynamically focus on certain parts of their input and thus reason more effectively. Inspired by the success of transformers, we adopted this technique to predict strategic flight departure demand in multiple horizons. This work was conducted in support of a MITRE-developed mobile application, Pacer, which displays predicted departure demand to general aviation (GA) flight operators so they can have better situation awareness of the potential for departure delays during busy periods. Field demonstrations involving Pacer's previously designed rule-based prediction method showed that the prediction accuracy of departure demand still has room for improvement. This research strives to improve prediction accuracy from two key aspects: better data sources and robust forecasting algorithms. We leveraged two data sources, Aviation System Performance Metrics (ASPM) and System Wide Information Management (SWIM), as our input. We then trained forecasting models with temporal fusion transformer (TFT) for five different airports. Case studies show that TFTs can perform better than traditional forecasting methods by large margins, and they can result in better prediction across diverse airports and with better interpretability.

GAN|对抗|攻击|生成相关(8篇)

【1】 Accelerating GAN training using highly parallel hardware on public cloud 标题:在公有云上使用高度并行的硬件加速GaN训练 链接:https://arxiv.org/abs/2111.04628

作者:Renato Cardoso,Dejan Golubovic,Ignacio Peluaga Lozada,Ricardo Rocha,João Fernandes,Sofia Vallecorsa 机构:CERN, Esplanade des Particules, Geneva, Switzerland 摘要:随着高能物理中机器和深度学习应用程序数量的增加,方便访问专用基础设施是快速高效研发的要求。这项工作探索了不同类型的云服务,以在并行环境中训练生成性对抗网络(GAN),使用Tensorflow数据并行策略。更具体地说,我们在多个GPU和Google Tensor处理单元(TPU)上并行化训练过程,并比较两种算法:TensorFlow内置逻辑和自定义循环,经过优化,可以更好地控制分配给每个GPU工作者或TPU核心的元素。生成数据的质量与蒙特卡罗模拟进行了比较。获得了训练过程的线性加速,同时保留了物理结果方面的大部分性能。此外,我们在多个GPU节点上对上述方法进行大规模基准测试,在不同的公共云提供商上部署训练过程,寻求总体效率和成本效益。数据科学、云部署选项和相关经济的结合允许异构爆发,探索基于云的服务的全部潜力。 摘要:With the increasing number of Machine and Deep Learning applications in High Energy Physics, easy access to dedicated infrastructure represents a requirement for fast and efficient R&D. This work explores different types of cloud services to train a Generative Adversarial Network (GAN) in a parallel environment, using Tensorflow data parallel strategy. More specifically, we parallelize the training process on multiple GPUs and Google Tensor Processing Units (TPU) and we compare two algorithms: the TensorFlow built-in logic and a custom loop, optimised to have higher control of the elements assigned to each GPU worker or TPU core. The quality of the generated data is compared to Monte Carlo simulation. Linear speed-up of the training process is obtained, while retaining most of the performance in terms of physics results. Additionally, we benchmark the aforementioned approaches, at scale, over multiple GPU nodes, deploying the training process on different public cloud providers, seeking for overall efficiency and cost-effectiveness. The combination of data science, cloud deployment options and associated economics allows to burst out heterogeneously, exploring the full potential of cloud-based services.

【2】 BARFED: Byzantine Attack-Resistant Federated Averaging Based on Outlier Elimination 标题:BARED:基于孤立点消除的拜占庭抗攻击联合平均 链接:https://arxiv.org/abs/2111.04550

作者:Ece Isik-Polat,Gorkem Polat,Altan Kocyigit 机构:Graduate School of Informatics, Middle East Technical University 摘要:在联合学习中,每个参与者用自己的数据训练其本地模型,并通过聚合来自这些参与者的模型更新在可信服务器上形成一个全局模型。由于服务器对参与者的训练过程没有影响和可见性以确保隐私,因此全局模型容易受到数据中毒和模型中毒等攻击。尽管最近提出了许多防御算法来应对这些攻击,但它们通常做出与联邦学习性质不符的强烈假设,例如非IID数据集。此外,它们大多缺乏全面的实验分析。在这项工作中,我们提出了一种称为BARFED的防御算法,该算法不对数据分布、参与者的更新相似性或恶意参与者的比率进行任何假设。BARFED主要考虑基于到全局模型的距离的模型体系结构每层参与者更新的异常值状态。因此,没有任何异常层的参与者参与模型聚合。我们在许多方面进行了大量的实验,结果表明,所提出的方法对不同的攻击具有鲁棒性。 摘要:In federated learning, each participant trains its local model with its own data and a global model is formed at a trusted server by aggregating model updates coming from these participants. Since the server has no effect and visibility on the training procedure of the participants to ensure privacy, the global model becomes vulnerable to attacks such as data poisoning and model poisoning. Although many defense algorithms have recently been proposed to address these attacks, they often make strong assumptions that do not agree with the nature of federated learning, such as Non-IID datasets. Moreover, they mostly lack comprehensive experimental analyses. In this work, we propose a defense algorithm called BARFED that does not make any assumptions about data distribution, update similarity of participants, or the ratio of the malicious participants. BARFED mainly considers the outlier status of participant updates for each layer of the model architecture based on the distance to the global model. Hence, the participants that do not have any outlier layer are involved in model aggregation. We perform extensive experiments on many grounds and show that the proposed approach provides a robust defense against different attacks.

【3】 Robust and Information-theoretically Safe Bias Classifier against Adversarial Attacks 标题:抗敌意攻击的稳健且信息理论安全的偏向分类器 链接:https://arxiv.org/abs/2111.04404

作者:Lijia Yu,Xiao-Shan Gao 机构:Academy of Mathematics and Systems Science, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Beijing , China 摘要:本文介绍了一种偏差分类器,即以Relu作为激活函数的DNN的偏差部分作为分类器。这项工作的动机是,偏差部分是一个具有零梯度的分段常数函数,因此不能直接受到基于梯度的方法的攻击,以生成对手,如FGSM。证明了偏差分类器的存在性,提出了一种有效的偏差分类器训练方法。证明了通过在偏差分类器中加入适当的随机一阶部分,在攻击产生完全随机的方向以产生对手的意义下,获得了一个针对原始模型梯度攻击的信息理论上安全的分类器。这似乎是第一次提出信息理论安全分类器的概念。提出了几种针对偏差分类器的攻击方法,并通过数值实验表明,在大多数情况下,偏差分类器比DNN对这些攻击具有更强的鲁棒性。 摘要:In this paper, the bias classifier is introduced, that is, the bias part of a DNN with Relu as the activation function is used as a classifier. The work is motivated by the fact that the bias part is a piecewise constant function with zero gradient and hence cannot be directly attacked by gradient-based methods to generate adversaries such as FGSM. The existence of the bias classifier is proved an effective training method for the bias classifier is proposed. It is proved that by adding a proper random first-degree part to the bias classifier, an information-theoretically safe classifier against the original-model gradient-based attack is obtained in the sense that the attack generates a totally random direction for generating adversaries. This seems to be the first time that the concept of information-theoretically safe classifier is proposed. Several attack methods for the bias classifier are proposed and numerical experiments are used to show that the bias classifier is more robust than DNNs against these attacks in most cases.

【4】 Get a Model! Model Hijacking Attack Against Machine Learning Models 标题:找个模特来!针对机器学习模型的模型劫持攻击 链接:https://arxiv.org/abs/2111.04394

作者:Ahmed Salem,Michael Backes,Yang Zhang 机构:CISPA Helmholtz Center for Information Security 备注:To Appear in NDSS 2022 摘要:机器学习(ML)已成为从自动驾驶到认证系统等各种关键应用的基石。然而,随着机器学习模型采用率的提高,出现了多种攻击。其中一类攻击是训练时攻击,对手在机器学习模型训练之前或期间执行攻击。在这项工作中,我们针对基于计算机视觉的机器学习模型提出了一种新的训练时间攻击,即模型劫持攻击。对手的目标是劫持目标模型,以便在模型所有者未注意到的情况下执行与其原始模型不同的任务。模型劫持可能会导致责任和安全风险,因为被劫持的模型所有者可能被诬陷为其模型提供非法或不道德的服务。模型劫持攻击的发起方式与现有数据中毒攻击相同。然而,模型劫持攻击的一个要求是隐蔽性,即用于劫持目标模型的数据样本应与模型的原始训练数据集相似。为此,我们提出了两种不同的劫持攻击模型,即变色龙和反向变色龙,基于一种新的编码器-解码器风格的ML模型,即伪装器。我们的评估表明,我们的两种模型劫持攻击都实现了较高的攻击成功率,模型效用的下降可以忽略不计。 摘要:Machine learning (ML) has established itself as a cornerstone for various critical applications ranging from autonomous driving to authentication systems. However, with this increasing adoption rate of machine learning models, multiple attacks have emerged. One class of such attacks is training time attack, whereby an adversary executes their attack before or during the machine learning model training. In this work, we propose a new training time attack against computer vision based machine learning models, namely model hijacking attack. The adversary aims to hijack a target model to execute a different task than its original one without the model owner noticing. Model hijacking can cause accountability and security risks since a hijacked model owner can be framed for having their model offering illegal or unethical services. Model hijacking attacks are launched in the same way as existing data poisoning attacks. However, one requirement of the model hijacking attack is to be stealthy, i.e., the data samples used to hijack the target model should look similar to the model's original training dataset. To this end, we propose two different model hijacking attacks, namely Chameleon and Adverse Chameleon, based on a novel encoder-decoder style ML model, namely the Camouflager. Our evaluation shows that both of our model hijacking attacks achieve a high attack success rate, with a negligible drop in model utility.

【5】 Geometrically Adaptive Dictionary Attack on Face Recognition 标题:人脸识别中的几何自适应字典攻击 链接:https://arxiv.org/abs/2111.04371

作者:Junyoung Byun,Hyojun Go,Changick Kim 机构:Korea Advanced Institute of Science and Technology (KAIST) 备注:Accepted at WACV 2022 摘要:基于CNN的人脸识别模型带来了显著的性能改进,但它们容易受到敌对干扰。最近的研究表明,对手可以愚弄模型,即使他们只能访问模型的硬标签输出。然而,由于需要许多查询来发现难以察觉的敌对噪声,因此减少查询数量对于这些攻击至关重要。在本文中,我们指出了现有基于决策的黑盒攻击的两个局限性。我们观察到,它们浪费了用于背景噪声优化的查询,并且没有利用为其他图像生成的对抗性扰动。我们利用三维人脸对齐来克服这些局限性,并提出了一种针对人脸识别的查询高效黑盒攻击的通用策略,即几何自适应字典攻击(GADA)。我们的核心思想是在UV纹理贴图中创建一个对抗性扰动,并将其投影到图像中的人脸上。它通过将扰动搜索空间限制在面部区域并有效地回收以前的扰动,极大地提高了查询效率。我们将GADA策略应用于两种现有的攻击方法,并在LFW和CPLFW数据集上的实验中显示了压倒性的性能改进。此外,我们还提出了一种新的攻击策略,可以绕过基于查询相似性的状态检测,识别基于查询的黑盒攻击过程。 摘要:CNN-based face recognition models have brought remarkable performance improvement, but they are vulnerable to adversarial perturbations. Recent studies have shown that adversaries can fool the models even if they can only access the models' hard-label output. However, since many queries are needed to find imperceptible adversarial noise, reducing the number of queries is crucial for these attacks. In this paper, we point out two limitations of existing decision-based black-box attacks. We observe that they waste queries for background noise optimization, and they do not take advantage of adversarial perturbations generated for other images. We exploit 3D face alignment to overcome these limitations and propose a general strategy for query-efficient black-box attacks on face recognition named Geometrically Adaptive Dictionary Attack (GADA). Our core idea is to create an adversarial perturbation in the UV texture map and project it onto the face in the image. It greatly improves query efficiency by limiting the perturbation search space to the facial area and effectively recycling previous perturbations. We apply the GADA strategy to two existing attack methods and show overwhelming performance improvement in the experiments on the LFW and CPLFW datasets. Furthermore, we also present a novel attack strategy that can circumvent query similarity-based stateful detection that identifies the process of query-based black-box attacks.

【6】 Characterizing the adversarial vulnerability of speech self-supervised learning 标题:语音自监督学习的对抗性脆弱性表征 链接:https://arxiv.org/abs/2111.04330

作者:Haibin Wu,Bo Zheng,Xu Li,Xixin Wu,Hung-yi Lee,Helen Meng 机构: Graduate Institute of Communication Engineering, National Taiwan University, Human-Computer Communications Laboratory, The Chinese University of Hong Kong 摘要:一个名为Speech processing Universal PERformance Benchmark(SUPERB)的排行榜推动了语音表征学习的研究。该排行榜旨在通过对体系结构和少量数据的最小修改,对各种下游语音任务的共享自监督学习(SSL)语音模型的性能进行基准测试。SSL上游模型通过最小的自适应改进了各种下游任务的性能。由于自监督学习上游模型和下游任务的范式在言语社区中引起了更多的关注,因此,表征这种范式的对抗鲁棒性具有高度优先性。在本文中,我们首次尝试研究这种范式在零知识对手和有限知识对手攻击下的对抗脆弱性。实验结果表明,SUPERB提出的攻击范式对有限的知识对手非常脆弱,零知识对手的攻击具有可转移性。XAB测试验证了精心设计的对手攻击的不可察觉性。 摘要:A leaderboard named Speech processing Universal PERformance Benchmark (SUPERB), which aims at benchmarking the performance of a shared self-supervised learning (SSL) speech model across various downstream speech tasks with minimal modification of architectures and small amount of data, has fueled the research for speech representation learning. The SUPERB demonstrates speech SSL upstream models improve the performance of various downstream tasks through just minimal adaptation. As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority. In this paper, we make the first attempt to investigate the adversarial vulnerability of such paradigm under the attacks from both zero-knowledge adversaries and limited-knowledge adversaries. The experimental results illustrate that the paradigm proposed by SUPERB is seriously vulnerable to limited-knowledge adversaries, and the attacks generated by zero-knowledge adversaries are with transferability. The XAB test verifies the imperceptibility of crafted adversarial attacks.

【7】 On pseudo-absence generation and machine learning for locust breeding ground prediction in Africa 标题:非洲蝗虫繁殖地预测的伪缺位生成与机器学习 链接:https://arxiv.org/abs/2111.03904

作者:Ibrahim Salihu Yusuf,Kale-ab Tessera,Thomas Tumiel,Sella Nevo,Arnu Pretorius 机构:InstaDeep, Google Research 备注:AI for Humanitarian Assistance and Disaster Response (AI+HADR) workshop, NeurIPS 2021 摘要:沙漠蝗虫的爆发威胁到非洲大部分地区的粮食安全,多年来影响了数百万人的生计。机器学习(ML)已被证明是蝗虫分布建模的有效方法,有助于预警。ML需要大量的标记数据进行训练。关于蝗虫的大多数公开标签数据都是仅存在的数据,其中只记录在某一地点看到的蝗虫。因此,以前使用ML的工作已经求助于伪缺勤生成方法来规避这个问题。最常用的方法是对感兴趣区域中的点进行随机采样,同时确保这些采样的伪缺失点至少与真实存在点保持特定距离。在本文中,我们比较了这种随机抽样方法与更先进的伪缺失生成方法,如环境剖面和最佳背景范围限制,特别是用于预测非洲沙漠蝗虫滋生地的方法。有趣的是,我们发现,对于我们测试的算法,即logistic回归、梯度提升、随机森林和最大熵,都在以前的工作中很流行,logistic模型在预测精度和F1分数方面都明显优于更复杂的集成方法。尽管背景范围限制与随机抽样相结合提高了集成方法的性能,但对于LR而言,情况并非如此,相反,使用环境剖面分析时获得了显著的改进。有鉴于此,我们得出结论,更简单的ML方法,如逻辑回归,结合更高级的伪缺失生成,特别是环境分析,可以是预测整个非洲蝗虫滋生地的明智和有效的方法。 摘要:Desert locust outbreaks threaten the food security of a large part of Africa and have affected the livelihoods of millions of people over the years. Machine learning (ML) has been demonstrated as an effective approach to locust distribution modelling which could assist in early warning. ML requires a significant amount of labelled data to train. Most publicly available labelled data on locusts are presence-only data, where only the sightings of locusts being present at a location are recorded. Therefore, prior work using ML have resorted to pseudo-absence generation methods as a way to circumvent this issue. The most commonly used approach is to randomly sample points in a region of interest while ensuring that these sampled pseudo-absence points are at least a specific distance away from true presence points. In this paper, we compare this random sampling approach to more advanced pseudo-absence generation methods, such as environmental profiling and optimal background extent limitation, specifically for predicting desert locust breeding grounds in Africa. Interestingly, we find that for the algorithms we tested, namely logistic regression, gradient boosting, random forests and maximum entropy, all popular in prior work, the logistic model performed significantly better than the more sophisticated ensemble methods, both in terms of prediction accuracy and F1 score. Although background extent limitation combined with random sampling boosted performance for ensemble methods, for LR this was not the case, and instead, a significant improvement was obtained when using environmental profiling. In light of this, we conclude that a simpler ML approach such as logistic regression combined with more advanced pseudo-absence generation, specifically environmental profiling, can be a sensible and effective approach to predicting locust breeding grounds across Africa.

【8】 Structure-aware generation of drug-like molecules 标题:类药物分子的结构感知生成 链接:https://arxiv.org/abs/2111.04107

作者:Pavol Drotár,Arian Rokkum Jamasb,Ben Day,Cătălina Cangea,Pietro Liò 机构:University of Cambridge 摘要:基于结构的药物设计包括寻找与蛋白质囊具有结构和化学互补性的配体分子。深度生成方法在从头开始提出新分子(从头设计)方面显示出了希望,避免了对化学空间的彻底虚拟筛选。大多数生成性从头模型都未能包含详细的配体-蛋白质相互作用和3D口袋结构。我们提出了一种新的监督模型,在离散的分子空间中联合生成具有三维姿态的分子图。分子是在晶体学数据的结构信息指导下,在口袋里一个原子一个原子地构建起来的。我们使用对接基准评估了我们的模型,发现引导生成比基线提高了8%的预测结合亲和力和10%的药物相似性分数。此外,我们的模型提出了结合分数超过一些已知配体的分子,这可能在未来的湿实验室研究中有用。 摘要:Structure-based drug design involves finding ligand molecules that exhibit structural and chemical complementarity to protein pockets. Deep generative methods have shown promise in proposing novel molecules from scratch (de-novo design), avoiding exhaustive virtual screening of chemical space. Most generative de-novo models fail to incorporate detailed ligand-protein interactions and 3D pocket structures. We propose a novel supervised model that generates molecular graphs jointly with 3D pose in a discretised molecular space. Molecules are built atom-by-atom inside pockets, guided by structural information from crystallographic data. We evaluate our model using a docking benchmark and find that guided generation improves predicted binding affinities by 8% and drug-likeness scores by 10% over the baseline. Furthermore, our model proposes molecules with binding scores exceeding some known ligands, which could be useful in future wet-lab studies.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】 Evaluating Predictive Uncertainty and Robustness to Distributional Shift Using Real World Data 标题:利用真实数据评估分布漂移的预测不确定性和稳健性 链接:https://arxiv.org/abs/2111.04665

作者:Kumud Lakara,Akshat Bhandari,Pratinav Seth,Ujjwal Verma 备注:6 pages, 3 figures, 4 tables 摘要:大多数机器学习模型都假设训练、测试和部署数据是独立的且分布相同(i.i.d.)。这种假设在自然环境中通常不成立。通常,部署数据会受到各种类型的分布转移的影响。模型性能的大小与数据集分布的这种变化成正比。因此,有必要评估模型的不确定性和对分布变化的鲁棒性,以便在真实数据上获得对其预期性能的现实估计。现有的评估不确定性和模型稳健性的方法缺乏全面性,往往无法描绘全貌。此外,到目前为止,大多数分析主要集中在分类任务上。在本文中,我们提出了更具洞察力的指标一般回归任务使用的移位天气预报数据集。我们还使用这些指标对基线方法进行了评估。 摘要:Most machine learning models operate under the assumption that the training, testing and deployment data is independent and identically distributed (i.i.d.). This assumption doesn't generally hold true in a natural setting. Usually, the deployment data is subject to various types of distributional shifts. The magnitude of a model's performance is proportional to this shift in the distribution of the dataset. Thus it becomes necessary to evaluate a model's uncertainty and robustness to distributional shifts to get a realistic estimate of its expected performance on real-world data. Present methods to evaluate uncertainty and model's robustness are lacking and often fail to paint the full picture. Moreover, most analysis so far has primarily focused on classification tasks. In this paper, we propose more insightful metrics for general regression tasks using the Shifts Weather Prediction Dataset. We also present an evaluation of the baseline methods using these metrics.

【2】 S3RP: Self-Supervised Super-Resolution and Prediction for Advection-Diffusion Process 标题:S3RP:对流扩散过程的自监督超分辨和预报 链接:https://arxiv.org/abs/2111.04639

作者:Chulin Wang,Kyongmin Yeo,Xiao Jin,Andres Codas,Levente J. Klein,Bruce Elmegreen 机构:IBM T.J. Watson Research Center, Yorktown Heights, New York 备注:9 pages, 8 figures 摘要:我们提出了一个具有有限信息的对流扩散过程的超分辨率模型。虽然大多数超分辨率模型在训练中假设高分辨率(HR)地面真实数据,但在许多情况下,此类HR数据集不容易访问。在这里,我们证明了用基于物理的正则化训练的递归卷积网络能够在没有HR地面真值数据的情况下重建HR信息。此外,考虑到超分辨率问题的不适定性,我们采用了递归Wasserstein自动编码器来建模不确定性。 摘要:We present a super-resolution model for an advection-diffusion process with limited information. While most of the super-resolution models assume high-resolution (HR) ground-truth data in the training, in many cases such HR dataset is not readily accessible. Here, we show that a Recurrent Convolutional Network trained with physics-based regularizations is able to reconstruct the HR information without having the HR ground-truth data. Moreover, considering the ill-posed nature of a super-resolution problem, we employ the Recurrent Wasserstein Autoencoder to model the uncertainty.

【3】 Uncertainty Quantification in Neural Differential Equations 标题:神经微分方程中的不确定性量化 链接:https://arxiv.org/abs/2111.04207

作者:Olga Graf,Pablo Flores,Pavlos Protopapas,Karim Pichara 摘要:不确定性量化(UQ)有助于根据收集的观察结果和不确定领域知识做出可信的预测。随着深度学习在各种应用中的使用增加,对能够使深度模型更可靠的有效UQ方法的需求也增加了。可以从有效处理不确定性中获益的应用包括基于深度学习的微分方程(DE)求解器。我们采用了几种最先进的UQ方法来获得DE解的预测不确定性,并展示了四种不同DE类型的结果。 摘要:Uncertainty quantification (UQ) helps to make trustworthy predictions based on collected observations and uncertain domain knowledge. With increased usage of deep learning in various applications, the need for efficient UQ methods that can make deep models more reliable has increased as well. Among applications that can benefit from effective handling of uncertainty are the deep learning based differential equation (DE) solvers. We adapt several state-of-the-art UQ methods to get the predictive uncertainty for DE solutions and show the results on four different DE types.

【4】 Uncertainty Calibration for Ensemble-Based Debiasing Methods 标题:基于集成的去偏方法的不确定度校正 链接:https://arxiv.org/abs/2111.04104

作者:Ruibin Xiong,Yimeng Chen,Liang Pang,Xueqi Chen,Yanyan Lan 机构:CAS Key Laboratory of Network Data Science and Technology, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences ,Baidu Inc., Academy of Mathematics and Systems Science, Chinese Academy of Sciences 备注:None 摘要:基于集成的debiasing方法已被证明可以有效地减少分类器对特定数据集偏差的依赖,方法是利用仅偏差模型的输出来调整学习目标。在本文中,我们重点讨论了这些基于集成的方法中的仅偏倚模型,该模型在现有文献中起着重要作用,但尚未得到太多关注。理论上,我们证明了仅偏倚模型的不确定性估计不准确会损害借记性能。从经验上看,我们表明现有的纯偏差模型在产生准确的不确定性估计方面存在不足。基于这些发现,我们建议对仅偏差模型进行校准,从而实现基于集成的三阶段debiasing框架,包括偏差建模、模型校准和debiasing。在NLI和事实验证任务上的实验结果表明,我们提出的三阶段debiasing框架始终优于传统的两阶段一进一出的分布精度。 摘要:Ensemble-based debiasing methods have been shown effective in mitigating the reliance of classifiers on specific dataset bias, by exploiting the output of a bias-only model to adjust the learning target. In this paper, we focus on the bias-only model in these ensemble-based methods, which plays an important role but has not gained much attention in the existing literature. Theoretically, we prove that the debiasing performance can be damaged by inaccurate uncertainty estimations of the bias-only model. Empirically, we show that existing bias-only models fall short in producing accurate uncertainty estimations. Motivated by these findings, we propose to conduct calibration on the bias-only model, thus achieving a three-stage ensemble-based debiasing framework, including bias modeling, model calibrating, and debiasing. Experimental results on NLI and fact verification tasks show that our proposed three-stage debiasing framework consistently outperforms the traditional two-stage one in out-of-distribution accuracy.

【5】 Contextual Unsupervised Outlier Detection in Sequences 标题:序列中上下文无监督的离群点检测 链接:https://arxiv.org/abs/2111.03808

作者:Mohamed A. Zahran,Leonardo Teixeira,Vinayak Rao,Bruno Ribeiro 机构:Department of Computer Science, Purdue University, West Lafayette, IN, Department of Statistics 备注:11 pages 摘要:这项工作提出了一个无监督的轨迹(序列)异常检测学习框架,该框架将排名测试与用户序列模型相结合。整体框架以期望的假阳性率(FPR)以无参数的方式识别序列异常值。我们根据last.fm和msnbc.com网站上的用户行为,在一组真实和模拟数据集上评估我们的方法,在这些网站上我们了解地面真相,并证明比现有方法更准确。我们还将我们的方法应用于一个由Pinterest和Facebook用户组成的大型真实数据集,我们发现,用户比其他类型的用户更倾向于重新分享Facebook好友的Pinterest帖子,这表明Facebook友谊对Pinterest上的分享行为有潜在影响。 摘要:This work proposes an unsupervised learning framework for trajectory (sequence) outlier detection that combines ranking tests with user sequence models. The overall framework identifies sequence outliers at a desired false positive rate (FPR), in an otherwise parameter-free manner. We evaluate our methodology on a collection of real and simulated datasets based on user actions at the websites last.fm and msnbc.com, where we know ground truth, and demonstrate improved accuracy over existing approaches. We also apply our approach to a large real-world dataset of Pinterest and Facebook users, where we find that users tend to re-share Pinterest posts of Facebook friends significantly more than other types of users, pointing to a potential influence of Facebook friendship on sharing behavior on Pinterest.

【6】 Can semi-supervised learning reduce the amount of manual labelling required for effective radio galaxy morphology classification? 标题:半监督学习能否减少有效的射电星系形态分类所需的人工标记量? 链接:https://arxiv.org/abs/2111.04357

作者:Inigo V. Slijepcevic,Anna M. M. Scaife 机构:Department of Physics and Astronomy, University of Manchester, UK 备注:Accepted in: Fourth Workshop on Machine Learning and the Physical Sciences (35th Conference on Neural Information Processing Systems; NeurIPS2021); final version 摘要:在这项工作中,我们研究了最先进的半监督学习(SSL)算法在现代射电天文学中应用于形态分类时的鲁棒性。我们测试当使用更少的标记数据点时,SSL是否可以实现与当前受监管的最新技术相当的性能,以及这些结果是否推广到使用真正的未标记数据。我们发现,尽管SSL提供了额外的规范化,但当使用很少的标签时,其性能会迅速下降,并且使用真正未标记的数据会导致性能显著下降。 摘要:In this work, we examine the robustness of state-of-the-art semi-supervised learning (SSL) algorithms when applied to morphological classification in modern radio astronomy. We test whether SSL can achieve performance comparable to the current supervised state of the art when using many fewer labelled data points and if these results generalise to using truly unlabelled data. We find that although SSL provides additional regularisation, its performance degrades rapidly when using very few labels, and that using truly unlabelled data leads to a significant drop in performance.

迁移|Zero/Few/One-Shot|自适应(8篇)

【1】 Universal and data-adaptive algorithms for model selection in linear contextual bandits 标题:线性上下文土匪模型选择的通用和数据自适应算法 链接:https://arxiv.org/abs/2111.04688

作者:Vidya Muthukumar,Akshay Krishnamurthy 机构:Electrical and Computer Engineering and Industrial and Systems Engineering, Georgia Institute of Technology†, Microsoft Research, New York City ‡ 备注:27 pages 摘要:上下文bandits中的模型选择是相对于固定模型类的遗憾最小化的一个重要补充问题。我们认为模型选择的最简单的非平凡实例:区分一个简单的多武装强盗问题从线性上下文强盗问题。即使在这种情况下,当前最先进的方法也会以次优的方式进行探索,并需要强大的“特征多样性”条件。在本文中,我们引入了新的算法,a)以数据自适应的方式进行探索,b)提供形式为$\mathcal{O}(d^{\alpha}T^{1-\alpha})的模型选择保证,其中,$d$表示线性模型的维数,$T$表示轮的总数。第一种算法具有“两全其美”的特性,同时恢复在不同分布假设下保持的两个先验结果。第二种方法完全消除了分布假设,扩大了可处理模型选择的范围。我们的方法在一些附加假设下扩展到嵌套线性上下文盗贼之间的模型选择。 摘要:Model selection in contextual bandits is an important complementary problem to regret minimization with respect to a fixed model class. We consider the simplest non-trivial instance of model-selection: distinguishing a simple multi-armed bandit problem from a linear contextual bandit problem. Even in this instance, current state-of-the-art methods explore in a suboptimal manner and require strong "feature-diversity" conditions. In this paper, we introduce new algorithms that a) explore in a data-adaptive manner, and b) provide model selection guarantees of the form $\mathcal{O}(d^{\alpha} T^{1- \alpha})$ with no feature diversity conditions whatsoever, where $d$ denotes the dimension of the linear model and $T$ denotes the total number of rounds. The first algorithm enjoys a "best-of-both-worlds" property, recovering two prior results that hold under distinct distributional assumptions, simultaneously. The second removes distributional assumptions altogether, expanding the scope for tractable model selection. Our approach extends to model selection among nested linear contextual bandits under some additional assumptions.

【2】 A Relational Model for One-Shot Classification 标题:一种用于一次分类的关系模型 链接:https://arxiv.org/abs/2111.04313

作者:Arturs Polis,Alexander Ilin 机构:Aalto University, Espoo, Finland 备注:Published at ESANN 2021 摘要:我们表明,具有内置关系归纳偏差的深度学习模型可以在不依赖大量数据扩充的情况下为样本有效学习带来好处。所提出的一次性分类模型以局部和成对注意的形式对一对输入进行关系匹配。我们的方法完美地解决了一次性图像分类Omniglot挑战。我们的模型在没有数据扩充的情况下,超过了人类水平的准确性,以及以前的最先进水平。 摘要:We show that a deep learning model with built-in relational inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation. The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention. Our approach solves perfectly the one-shot image classification Omniglot challenge. Our model exceeds human level accuracy, as well as the previous state of the art, with no data augmentation.

【3】 Mimic: An adaptive algorithm for multivariate time series classification 标题:MIMIC:一种适用于多变量时间序列分类的自适应算法 链接:https://arxiv.org/abs/2111.04273

作者:Yuhui Wang,Diane J. Cook 机构:Washington State University 摘要:时间序列数据很有价值,但往往难以理解。在金融、医疗和其他关键应用程序中获得时间序列分类器的信任可能依赖于创建可解释的模型。研究人员此前被迫在缺乏预测能力的可解释方法和缺乏透明度的深度学习方法之间做出选择。在本文中,我们提出了一种新的模拟算法,在引入可解释性的同时保留最强分类器的预测精度。模拟反映了现有多变量时间序列分类器的学习方法,同时生成视觉表示,以增强用户对学习模型的理解。在26个时间序列数据集上的实验支持Mimic可视化和准确地模拟各种时间序列分类器的能力。 摘要:Time series data are valuable but are often inscrutable. Gaining trust in time series classifiers for finance, healthcare, and other critical applications may rely on creating interpretable models. Researchers have previously been forced to decide between interpretable methods that lack predictive power and deep learning methods that lack transparency. In this paper, we propose a novel Mimic algorithm that retains the predictive accuracy of the strongest classifiers while introducing interpretability. Mimic mirrors the learning method of an existing multivariate time series classifier while simultaneously producing a visual representation that enhances user understanding of the learned model. Experiments on 26 time series datasets support Mimic's ability to imitate a variety of time series classifiers visually and accurately.

【4】 Group-Aware Threshold Adaptation for Fair Classification 标题:用于公平分类的组感知阈值自适应 链接:https://arxiv.org/abs/2111.04271

作者:Taeuk Jang,Pengyi Shi,Xiaoqian Wang 机构:School of Electrical and Computer Engineering, Purdue University, West Lafayette, USA, Krannert School of Management, Purdue University, West Lafayette, USA 备注:19 pages 1 figures 摘要:随着机器学习在不同领域的应用不断扩大和多样化,机器学习中的公平性越来越受到人们的关注。为了缓解不同人口群体之间的模型歧视行为,我们引入了一种新的后处理方法,通过群体感知阈值自适应来优化多个公平性约束。我们建议通过优化根据分类模型输出的概率分布估计的混淆矩阵来学习每个人口统计组的自适应分类阈值。由于我们只需要估计模型输出的概率分布,而不需要分类模型结构,因此我们的后处理模型可以应用于广泛的分类模型,以模型无关的方式提高公平性,并确保隐私。这甚至允许我们对现有的公平性方法进行后处理,以进一步改善准确性和公平性之间的权衡。此外,我们的模型具有较低的计算成本。我们对优化算法的收敛性以及方法的准确性和公平性之间的权衡进行了严格的理论分析。在相同条件下,我们的方法在理论上比现有方法具有更好的近似最优性上界。实验结果表明,我们的方法优于现有的方法,得到的结果最接近理论精度和公平性权衡边界。 摘要:The fairness in machine learning is getting increasing attention, as its applications in different fields continue to expand and diversify. To mitigate the discriminated model behaviors between different demographic groups, we introduce a novel post-processing method to optimize over multiple fairness constraints through group-aware threshold adaptation. We propose to learn adaptive classification thresholds for each demographic group by optimizing the confusion matrix estimated from the probability distribution of a classification model output. As we only need an estimated probability distribution of model output instead of the classification model structure, our post-processing model can be applied to a wide range of classification models and improve fairness in a model-agnostic manner and ensure privacy. This even allows us to post-process existing fairness methods to further improve the trade-off between accuracy and fairness. Moreover, our model has low computational cost. We provide rigorous theoretical analysis on the convergence of our optimization algorithm and the trade-off between accuracy and fairness of our method. Our method theoretically enables a better upper bound in near optimality than existing method under same condition. Experimental results demonstrate that our method outperforms state-of-the-art methods and obtains the result that is closest to the theoretical accuracy-fairness trade-off boundary.

【5】 Cross-modal Zero-shot Hashing by Label Attributes Embedding 标题:基于标签属性嵌入的跨模式零射散列 链接:https://arxiv.org/abs/2111.04080

作者:Runmin Wang,Guoxian Yu,Lei Liu,Lizhen Cui,Carlotta Domeniconi,Xiangliang Zhang 备注:7 pages, 2 figures 摘要:跨模态哈希(CMH)是跨模态近似最近邻搜索中最有前途的方法之一。大多数CMH解决方案理想地假设训练集和测试集的标签是相同的。然而,这一假设经常被违反,导致零炮CMH问题。最近解决这个问题的努力集中在使用标签属性将知识从可见的类转移到不可见的类上。然而,属性与多模态数据的特征是分离的。为了减少信息差距,我们引入了一种称为LAEH(零炮跨模式散列标签属性嵌入)的方法。LAEH首先通过word2vec模型获取标签的初始语义属性向量,然后使用转换网络将其转换为公共子空间。接下来,它利用散列向量和特征相似矩阵来指导不同模式的特征提取网络。同时,LAEH利用属性相似度作为标签相似度的补充,对标签嵌入和公共子空间进行了修正。实验表明,LAEH的性能优于相关的代表性零炮和跨模态散列方法。 摘要:Cross-modal hashing (CMH) is one of the most promising methods in cross-modal approximate nearest neighbor search. Most CMH solutions ideally assume the labels of training and testing set are identical. However, the assumption is often violated, causing a zero-shot CMH problem. Recent efforts to address this issue focus on transferring knowledge from the seen classes to the unseen ones using label attributes. However, the attributes are isolated from the features of multi-modal data. To reduce the information gap, we introduce an approach called LAEH (Label Attributes Embedding for zero-shot cross-modal Hashing). LAEH first gets the initial semantic attribute vectors of labels by word2vec model and then uses a transformation network to transform them into a common subspace. Next, it leverages the hash vectors and the feature similarity matrix to guide the feature extraction network of different modalities. At the same time, LAEH uses the attribute similarity as the supplement of label similarity to rectify the label embedding and common subspace. Experiments show that LAEH outperforms related representative zero-shot and cross-modal hashing methods.

【6】 Open-Set Crowdsourcing using Multiple-Source Transfer Learning 标题:基于多源迁移学习的开集众包 链接:https://arxiv.org/abs/2111.04073

作者:Guangyang Han,Guoxian Yu,Lei Liu,Lizhen Cui,Carlotta Domeniconi,Xiangliang Zhang 机构:Zhang, College of Computer and Information Sciences, Southwest University, China, School of Software, Shandong University, China, Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, China 备注:8 pages, 1 figures 摘要:我们提出并定义了一个新的众包场景,开放式众包,我们只知道一个不熟悉的众包项目的一般主题,而不知道它的标签空间,即一组可能的标签。这仍然是一个任务注释问题,但对任务和标签空间的不熟悉妨碍了任务和工作人员的建模以及真理推理。我们提出了一个直观的解决方案OSCrowd。首先,OSCrowd将群组主题相关数据集集成到一个大的源域中,以促进部分迁移学习,从而近似这些任务的标签空间推断。接下来,它根据类别相关性为每个源域分配权重。在此之后,它使用多源开放集迁移学习来建模群组任务并分配可能的注释。迁移学习给出的标签空间和注释将用于指导和规范人群工作者的注释。我们在一个在线场景中验证了OSCrowd,并证明OSCrowd解决了开放集众包问题,比相关的众包解决方案工作得更好。 摘要:We raise and define a new crowdsourcing scenario, open set crowdsourcing, where we only know the general theme of an unfamiliar crowdsourcing project, and we don't know its label space, that is, the set of possible labels. This is still a task annotating problem, but the unfamiliarity with the tasks and the label space hampers the modelling of the task and of workers, and also the truth inference. We propose an intuitive solution, OSCrowd. First, OSCrowd integrates crowd theme related datasets into a large source domain to facilitate partial transfer learning to approximate the label space inference of these tasks. Next, it assigns weights to each source domain based on category correlation. After this, it uses multiple-source open set transfer learning to model crowd tasks and assign possible annotations. The label space and annotations given by transfer learning will be used to guide and standardize crowd workers' annotations. We validate OSCrowd in an online scenario, and prove that OSCrowd solves the open set crowdsourcing problem, works better than related crowdsourcing solutions.

【7】 Inferring untrained complex dynamics of delay systems using an adapted echo state network 标题:利用自适应回声状态网络推断时滞系统的未训练复杂动态 链接:https://arxiv.org/abs/2111.03706

作者:Mirko Goldmann,Claudio R. Mirasso,Ingo Fischer,Miguel C. Soriano 机构:Instituto de F´ısica Interdisciplinar y Sistemas Complejos (IFISC, UIB-CSIC), Campus Universitat de les Illes Balears E-, Palma de Mallorca, Spain 摘要:由于信号传播速度有限,许多复杂系统具有时滞,这可能导致高维混沌行为,使预测变得复杂。在这里,我们提出了一个回声状态网络,适用于具有任意延迟的系统的物理。在训练网络预测具有唯一且足够长延迟的系统后,它已经学会预测所有其他延迟的系统动力学。通过对网络拓扑结构的简单调整,我们可以推断出未经训练的特征,如高维混沌吸引子、分岔,甚至是多稳态,这些特征会以更短和更长的延迟出现。因此,延迟系统的物理知识和数据驱动的机器学习的融合产生了一个具有高泛化能力和前所未有的预测精度的模型。 摘要:Caused by finite signal propagation velocities, many complex systems feature time delays that may induce high-dimensional chaotic behavior and make forecasting intricate. Here, we propose an echo state network adaptable to the physics of systems with arbitrary delays. After training the network to forecast a system with a unique and sufficiently long delay, it already learned to predict the system dynamics for all other delays. A simple adaptation of the network's topology allows us to infer untrained features such as high-dimensional chaotic attractors, bifurcations, and even multistabilities, that emerge with shorter and longer delays. Thus, the fusion of physical knowledge of the delay system and data-driven machine learning yields a model with high generalization capabilities and unprecedented prediction accuracy.

【8】 RF-Net: a Unified Meta-learning Framework for RF-enabled One-shot Human Activity Recognition 标题:RF-Net:一种支持RF单次人体活动识别的统一元学习框架 链接:https://arxiv.org/abs/2111.04566

作者:Shuya Ding,Zhe Chen,Tianyue Zheng,Jun Luo 机构:School of Computer Science and Engineering, Nanyang Technological University, Singapore 备注:None 摘要:基于射频(RF)的无设备人类活动识别(HAR)作为一种有前途的解决方案在许多应用中得到了发展。然而,与基于设备(或可穿戴)的传感相比,无设备(或非接触)传感通常对环境变化更敏感。此外,射频数据集在采集过程中严格要求在线标记,这与图像和文本数据采集截然不同,在图像和文本数据采集过程中,可以利用人工解释执行离线标记。因此,现有的RF-HAR解决方案需要一个艰苦的数据收集过程来适应新环境。为此,我们提出RF网络作为一种基于元学习的一次性RF-HAR方法;它将环境适应的标签工作减少到最低水平。特别是,我们首先研究了三种有代表性的射频感应技术和两种主要的元学习方法。研究结果激励我们在两个设计方面进行创新:i)双路径基础HAR网络,其中时域和频域都致力于学习强大的射频特征,包括基于空间和注意力的时域特征;ii)基于度量的元学习框架,以增强基础网络的快速适应能力,包括一个射频特定的度量模块以及一个剩余分类模块。我们在多个真实室内环境中进行了基于所有三种射频传感技术的广泛实验;所有结果都有力地证明了射频网络与最先进的基线相比的有效性。 摘要:Radio-Frequency (RF) based device-free Human Activity Recognition (HAR) rises as a promising solution for many applications. However, device-free (or contactless) sensing is often more sensitive to environment changes than device-based (or wearable) sensing. Also, RF datasets strictly require on-line labeling during collection, starkly different from image and text data collections where human interpretations can be leveraged to perform off-line labeling. Therefore, existing solutions to RF-HAR entail a laborious data collection process for adapting to new environments. To this end, we propose RF-Net as a meta-learning based approach to one-shot RF-HAR; it reduces the labeling efforts for environment adaptation to the minimum level. In particular, we first examine three representative RF sensing techniques and two major meta-learning approaches. The results motivate us to innovate in two designs: i) a dual-path base HAR network, where both time and frequency domains are dedicated to learning powerful RF features including spatial and attention-based temporal ones, and ii) a metric-based meta-learning framework to enhance the fast adaption capability of the base network, including an RF-specific metric module along with a residual classification module. We conduct extensive experiments based on all three RF sensing techniques in multiple real-world indoor environments; all results strongly demonstrate the efficacy of RF-Net compared with state-of-the-art baselines.

强化学习(12篇)

【1】 Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning 标题:理解数据集特征对离线强化学习的影响 链接:https://arxiv.org/abs/2111.04714

作者:Kajetan Schweighofer,Markus Hofmarcher,Marius-Constantin Dinu,Philipp Renz,Angela Bitto-Nemling,Vihang Patil,Sepp Hochreiter 机构:§ELLIS Unit Linz and LIT AI Lab, Institute for Machine Learning, Johannes Kepler University Linz, Austria, ‡Dynatrace Research, Austria, †Institute of Advanced Research in Artificial Intelligence (IARAI) 备注:Code: this https URL 摘要:在现实世界中,通过弱策略影响环境可能代价高昂或风险很大,因此阻碍了强化学习在现实世界中的应用。离线强化学习(RL)可以从给定的数据集中学习策略,而无需与环境交互。然而,数据集是离线RL算法的唯一信息源,并决定学习策略的性能。我们仍然缺乏关于数据集特征如何影响不同离线RL算法的研究。因此,我们对数据集特征如何影响离散动作环境下离线RL算法的性能进行了全面的实证分析。数据集由两个指标构成:(1)由轨迹质量(TQ)测量的平均数据集回报率;(2)由州行动覆盖率(SACo)测量的覆盖率。我们发现,非策略深度Q网络家族的变体需要SACo较高的数据集才能表现良好。对给定数据集约束学习策略的算法对于具有高TQ或SACo的数据集表现良好。对于具有高TQ的数据集,行为克隆的性能优于或类似于最佳离线RL算法。 摘要:In real world, affecting the environment by a weak policy can be expensive or very risky, therefore hampers real world applications of reinforcement learning. Offline Reinforcement Learning (RL) can learn policies from a given dataset without interacting with the environment. However, the dataset is the only source of information for an Offline RL algorithm and determines the performance of the learned policy. We still lack studies on how dataset characteristics influence different Offline RL algorithms. Therefore, we conducted a comprehensive empirical analysis of how dataset characteristics effect the performance of Offline RL algorithms for discrete action environments. A dataset is characterized by two metrics: (1) the average dataset return measured by the Trajectory Quality (TQ) and (2) the coverage measured by the State-Action Coverage (SACo). We found that variants of the off-policy Deep Q-Network family require datasets with high SACo to perform well. Algorithms that constrain the learned policy towards the given dataset perform well for datasets with high TQ or SACo. For datasets with high TQ, Behavior Cloning outperforms or performs similarly to the best Offline RL algorithms.

【2】 Interactive Inverse Reinforcement Learning for Cooperative Games 标题:基于合作对策的交互式逆向强化学习 链接:https://arxiv.org/abs/2111.04698

作者:Thomas Kleine Buening,Anne-Marie George,Christos Dimitrakakis 机构:University of Oslo 摘要:我们研究的问题是设计人工智能代理,可以学习有效地与潜在的次优合作伙伴合作,而没有机会获得联合奖励函数。该问题被建模为一个合作的、幕式的两主体马尔可夫决策过程。在Stackelberg博弈公式中,我们假设仅控制两个代理中的第一个代理,其中第二个代理的作用是在第一个代理的策略下使预期效用最大化。第一个代理应如何行动,以便尽快了解联合奖励函数,从而使联合策略尽可能接近最优?在本文中,我们分析了如何在这种交互式双代理场景中获得关于奖励函数的知识。我们发现,当学习代理的策略对转移函数有显著影响时,可以有效地学习奖励函数。 摘要:We study the problem of designing AI agents that can learn to cooperate effectively with a potentially suboptimal partner while having no access to the joint reward function. This problem is modeled as a cooperative episodic two-agent Markov decision process. We assume control over only the first of the two agents in a Stackelberg formulation of the game, where the second agent is acting so as to maximise expected utility given the first agent's policy. How should the first agent act in order to learn the joint reward function as quickly as possible, and so that the joint policy is as close to optimal as possible? In this paper, we analyse how knowledge about the reward function can be gained in this interactive two-agent scenario. We show that when the learning agent's policies have a significant effect on the transition function, the reward function can be learned efficiently.

【3】 Reinforcement Learning for Mixed Autonomy Intersections 标题:混合自主交叉口的强化学习 链接:https://arxiv.org/abs/2111.04686

作者:Zhongxia Yan,Cathy Wu 机构: Massachusetts Institute ofTechnology zxyan, Massachusetts Instituteof Technology cathywu 备注:None 摘要:我们提出了一种无模型强化学习方法,用于控制只有双向和四向交叉口直通交通的模拟交通网络中的混合自主交通。我们的方法利用多智能体策略分解,允许对任意数量的受控车辆进行基于局部观测的分散控制。我们证明,即使没有奖励塑造,强化学习也能学会协调车辆以展示类似交通信号的行为,在33-50%受控车辆的情况下实现接近最优的吞吐量。借助于多任务学习和迁移学习,我们证明了这种行为在流入率和交通网络规模上的普遍性。我们的代码、模型和结果视频可在https://github.com/ZhongxiaYan/mixed_autonomy_intersections. 摘要:We propose a model-free reinforcement learning method for controlling mixed autonomy traffic in simulated traffic networks with through-traffic-only two-way and four-way intersections. Our method utilizes multi-agent policy decomposition which allows decentralized control based on local observations for an arbitrary number of controlled vehicles. We demonstrate that, even without reward shaping, reinforcement learning learns to coordinate the vehicles to exhibit traffic signal-like behaviors, achieving near-optimal throughput with 33-50% controlled vehicles. With the help of multi-task learning and transfer learning, we show that this behavior generalizes across inflow rates and size of the traffic network. Our code, models, and videos of results are available at https://github.com/ZhongxiaYan/mixed_autonomy_intersections.

【4】 Improving RNA Secondary Structure Design using Deep Reinforcement Learning 标题:利用深度强化学习改进RNA二级结构设计 链接:https://arxiv.org/abs/2111.04504

作者:Alexander Whatley,Zhekun Luo,Xiangru Tang 机构:Department of Electrical Engineering and Computer Sciences, University of California, Berkeley 摘要:近年来开发新药和治疗方法的成本不断上升,导致了生物分子设计优化技术的广泛研究。目前,生物分子设计中应用最广泛的方法是定向进化,这是一种模拟生物进化的贪婪爬山算法。在本文中,我们提出了一个将强化学习应用于RNA序列设计的新基准,其中目标函数被定义为序列二级结构中的自由能。除了对标准库中每种强化学习算法的普通实现进行实验外,我们还分析了每种算法的变体,其中我们修改了算法的奖励函数并调整了模型的超参数。我们展示了我们对这些算法所做的消融分析的结果,以及表明该算法跨批次性能及其搜索RNA序列可能空间的能力的图表。我们发现,我们的DQN算法在这种情况下表现最好,而PPO在所有测试算法中表现最好。我们的结果应该引起生物分子设计界的兴趣,并且应该作为未来分子设计中涉及机器学习的实验的基线。 摘要:Rising costs in recent years of developing new drugs and treatments have led to extensive research in optimization techniques in biomolecular design. Currently, the most widely used approach in biomolecular design is directed evolution, which is a greedy hill-climbing algorithm that simulates biological evolution. In this paper, we propose a new benchmark of applying reinforcement learning to RNA sequence design, in which the objective function is defined to be the free energy in the sequence's secondary structure. In addition to experimenting with the vanilla implementations of each reinforcement learning algorithm from standard libraries, we analyze variants of each algorithm in which we modify the algorithm's reward function and tune the model's hyperparameters. We show results of the ablation analysis that we do for these algorithms, as well as graphs indicating the algorithm's performance across batches and its ability to search the possible space of RNA sequences. We find that our DQN algorithm performs by far the best in this setting, contrasting with, in which PPO performs the best among all tested algorithms. Our results should be of interest to those in the biomolecular design community and should serve as a baseline for future experiments involving machine learning in molecule design.

【5】 Batch Reinforcement Learning from Crowds 标题:群体中的批量强化学习 链接:https://arxiv.org/abs/2111.04279

作者:Guoxi Zhang,Hisashi Kashima 机构:Graduate School of Informatics, Kyoto University 备注:16 pages 摘要:批量强化学习的一个缺点是它需要数据中的奖励,因此不适用于没有奖励函数的任务。缺乏奖励的现有设置,如行为克隆,依赖于从人类收集的最佳演示。不幸的是,需要广泛的专业知识来确保最佳性,这阻碍了为复杂任务获取大规模数据。本文通过从偏好中学习奖励函数来解决批量强化学习环境中的奖励不足问题。生成首选项只需要对任务有基本的了解。作为一个心理过程,生成偏好比执行演示更快。因此,可以使用众包从非专家人群中大规模收集偏好。本文解决了从非专家人群收集数据时出现的一个关键挑战:偏好中的噪音。提出了一种新的用于标签可靠性建模的概率模型。此外,该模型利用学习的奖励函数平滑估计。对Atari数据集的评估证明了所提出模型的有效性,随后进行了消融研究,以分析所提出想法的相对重要性。 摘要:A shortcoming of batch reinforcement learning is its requirement for rewards in data, thus not applicable to tasks without reward functions. Existing settings for lack of reward, such as behavioral cloning, rely on optimal demonstrations collected from humans. Unfortunately, extensive expertise is required for ensuring optimality, which hinder the acquisition of large-scale data for complex tasks. This paper addresses the lack of reward in a batch reinforcement learning setting by learning a reward function from preferences. Generating preferences only requires a basic understanding of a task. Being a mental process, generating preferences is faster than performing demonstrations. So preferences can be collected at scale from non-expert humans using crowdsourcing. This paper tackles a critical challenge that emerged when collecting data from non-expert humans: the noise in preferences. A novel probabilistic model is proposed for modelling the reliability of labels, which utilizes labels collaboratively. Moreover, the proposed model smooths the estimation with a learned reward function. Evaluation on Atari datasets demonstrates the effectiveness of the proposed model, followed by an ablation study to analyze the relative importance of the proposed ideas.

【6】 Data-Efficient Deep Reinforcement Learning for Attitude Control of Fixed-Wing UAVs: Field Experiments 标题:数据高效的深度强化学习在固定翼无人机姿态控制中的应用 链接:https://arxiv.org/abs/2111.04153

作者:Eivind Bøhn,Erlend M. Coates,Dirk Reinhardt,Tor Arne Johansen 机构:Department of Mathematics and Cybernetics, SINTEF DIGITAL, Oslo, Norway, Centre for Autonomous Marine Operations and Systems, Department of Engineering Cybernetics, NTNU, Trondheim, Norway 备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要:固定翼无人机(UAV)的姿态控制是一个困难的控制问题,部分原因是不确定的非线性动力学、执行器约束以及纵向和横向耦合运动。目前最先进的自动驾驶仪基于线性控制,因此其有效性和性能受到限制。深度强化学习(DRL)是一种通过与被控系统的交互自动发现最优控制律的机器学习方法,能够处理复杂的非线性动力学问题。我们在本文中表明,DRL可以成功地学习直接在原始非线性动力学上操作的固定翼无人机的姿态控制,只需要三分钟的飞行数据。我们首先在仿真环境中训练我们的模型,然后在无人机飞行试验中部署学习的控制器,在无需进一步在线学习的情况下,展示了与最先进的比例积分微分(PID)姿态控制器相当的性能。为了更好地理解学习控制器的操作,我们对其行为进行了分析,包括与现有良好调整的PID控制器的比较。 摘要:Attitude control of fixed-wing unmanned aerial vehicles (UAVs)is a difficult control problem in part due to uncertain nonlinear dynamics, actuator constraints, and coupled longitudinal and lateral motions. Current state-of-the-art autopilots are based on linear control and are thus limited in their effectiveness and performance. Deep reinforcement learning (DRL) is a machine learning method to automatically discover optimal control laws through interaction with the controlled system, that can handle complex nonlinear dynamics. We show in this paper that DRL can successfully learn to perform attitude control of a fixed-wing UAV operating directly on the original nonlinear dynamics, requiring as little as three minutes of flight data. We initially train our model in a simulation environment and then deploy the learned controller on the UAV in flight tests, demonstrating comparable performance to the state-of-the-art ArduPlaneproportional-integral-derivative (PID) attitude controller with no further online learning required. To better understand the operation of the learned controller we present an analysis of its behaviour, including a comparison to the existing well-tuned PID controller.

【7】 Optimization of the Model Predictive Control Meta-Parameters Through Reinforcement Learning 标题:基于强化学习的模型预测控制元参数优化 链接:https://arxiv.org/abs/2111.04146

作者:Eivind Bøhn,Sebastien Gros,Signe Moe,Tor Arne Johansen 机构:Artificial Intelligence and Data Analytics, SINTEF Digital, Oslo, Norway, Department of Engineering Cybernetics, NTNU, Trondheim, Norway, Sopra Steria Applications, Oslo, Norway, Centre for Autonomous Marine Operations and Systems, NTNU, Trondheim, Norway 备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要:模型预测控制(MPC)越来越多地被用于快速系统和嵌入式应用的控制。然而,MPC对此类系统有一些重大挑战。它的高计算复杂度导致了控制算法的高功耗,这在电池供电的嵌入式系统中占据了相当大的能源份额。必须调整MPC参数,这在很大程度上是一个反复试验的过程,在很大程度上影响控制器的控制性能、鲁棒性和计算复杂性。在本文中,我们提出了一个新的框架,其中任何控制算法的参数都可以使用强化学习(RL)进行联合调整,目的是同时优化控制算法的控制性能和功耗。我们提出了用RL优化MPC元参数的新思想,即影响MPC问题结构的参数,而不是给定问题的解决方案。我们的控制算法基于事件触发的MPC,在MPC中我们了解何时应该重新计算MPC,并且在MPC计算之间应用双模MPC和线性状态反馈控制律。我们提出了一种新的混合分布策略,并表明通过联合优化,我们可以实现在单独优化相同参数时不会出现的改进。我们在倒立摆控制任务上演示了我们的框架,将控制系统的总计算时间减少了36%,同时与性能最佳的MPC基线相比,控制性能也提高了18.4%。 摘要:Model predictive control (MPC) is increasingly being considered for control of fast systems and embedded applications. However, the MPC has some significant challenges for such systems. Its high computational complexity results in high power consumption from the control algorithm, which could account for a significant share of the energy resources in battery-powered embedded systems. The MPC parameters must be tuned, which is largely a trial-and-error process that affects the control performance, the robustness and the computational complexity of the controller to a high degree. In this paper, we propose a novel framework in which any parameter of the control algorithm can be jointly tuned using reinforcement learning(RL), with the goal of simultaneously optimizing the control performance and the power usage of the control algorithm. We propose the novel idea of optimizing the meta-parameters of MPCwith RL, i.e. parameters affecting the structure of the MPCproblem as opposed to the solution to a given problem. Our control algorithm is based on an event-triggered MPC where we learn when the MPC should be re-computed, and a dual mode MPC and linear state feedback control law applied in between MPC computations. We formulate a novel mixture-distribution policy and show that with joint optimization we achieve improvements that do not present themselves when optimizing the same parameters in isolation. We demonstrate our framework on the inverted pendulum control task, reducing the total computation time of the control system by 36% while also improving the control performance by 18.4% over the best-performing MPC baseline.

【8】 DQRE-SCnet: A novel hybrid approach for selecting users in Federated Learning with Deep-Q-Reinforcement Learning based on Spectral Clustering 标题:DQRE-SCnet:一种基于谱聚类的深度Q强化联合学习用户选择新方法 链接:https://arxiv.org/abs/2111.04105

作者:Mohsen Ahmadi,Ali Taghavirashidizadeh,Danial Javaheri,Armin Masoumian,Saeid Jafarzadeh Ghoushchi,Yaghoub Pourasad 机构:a Department of Industrial Engineering, Urmia University of Technology (UUT), P.O. Box: ,-, Urmia, Iran, b Department of Electrical And Electronic Engineering, Islamic Azad University, Central Tehran Branch (IAUCTB), Tehran, Iran 备注:14 pages, Accepted, Elsevier (Journal of King Saud University - Computer and Information Sciences) 摘要:基于现实世界中敏感数据的机器学习模型有望在从医学筛查到疾病爆发、农业、工业、国防科学等领域取得进展。在许多应用程序中,学习参与者交流轮受益于收集自己的私有数据集,在真实数据上教授详细的机器学习模型,并分享使用这些模型的好处。由于存在隐私和安全问题,大多数人避免在训练中共享敏感数据。在没有每个用户向中央服务器演示其本地数据的情况下,联合学习允许各方在其共享数据上联合训练机器学习算法。这种集体隐私学习方法导致训练期间的重要沟通费用。大多数大规模机器学习应用需要基于在各种设备和场所生成的数据集的分散学习。这些数据集是分散学习的一个重要障碍,因为它们的不同环境导致了不同设备和地点之间数据交付的显著差异。研究人员提出了几种在联邦学习系统中实现数据隐私的方法。然而,同质本地数据仍然存在挑战。这种研究方法是选择节点(用户)在联邦学习中共享数据,以实现基于独立数据的均衡,从而提高精度,减少训练时间,增加收敛性。因此,本研究提出了一种基于谱聚类的组合式深度QRE强化学习集成,称为DQRE SCnet,用于在每个通信回合中选择设备子集。根据研究结果,可以减少联邦学习所需的通信轮数。 摘要:Machine learning models based on sensitive data in the real-world promise advances in areas ranging from medical screening to disease outbreaks, agriculture, industry, defense science, and more. In many applications, learning participant communication rounds benefit from collecting their own private data sets, teaching detailed machine learning models on the real data, and sharing the benefits of using these models. Due to existing privacy and security concerns, most people avoid sensitive data sharing for training. Without each user demonstrating their local data to a central server, Federated Learning allows various parties to train a machine learning algorithm on their shared data jointly. This method of collective privacy learning results in the expense of important communication during training. Most large-scale machine-learning applications require decentralized learning based on data sets generated on various devices and places. Such datasets represent an essential obstacle to decentralized learning, as their diverse contexts contribute to significant differences in the delivery of data across devices and locations. Researchers have proposed several ways to achieve data privacy in Federated Learning systems. However, there are still challenges with homogeneous local data. This research approach is to select nodes (users) to share their data in Federated Learning for independent data-based equilibrium to improve accuracy, reduce training time, and increase convergence. Therefore, this research presents a combined Deep-QReinforcement Learning Ensemble based on Spectral Clustering called DQRE-SCnet to choose a subset of devices in each communication round. Based on the results, it has been displayed that it is possible to decrease the number of communication rounds needed in Federated Learning.

【9】 A Deep Reinforcement Learning Approach for Composing Moving IoT Services 标题:一种用于移动物联网服务组合的深度强化学习方法 链接:https://arxiv.org/abs/2111.03967

作者:Azadeh Ghari Neiat,Athman Bouguettaya,Mohammed Bahutair 机构: This type of crowdsourced IoTAzadeh Ghari Neiat is with the School of Information Technology, DeakinUniversity, auAthman Bouguettaya and Mohammed Bahutair are with the Schoolof Computer Science, University of Sydney 摘要:我们开发了一个新的框架,用于高效和有效地发现在一段时间内靠近用户移动的众包服务。我们引入了一个移动众包服务模型,该模型被建模为一个移动区域。我们提出了一种基于深度强化学习的组合方法,用于选择和组合考虑质量参数的移动物联网服务。此外,我们还开发了一种基于并行集群的服务发现算法,作为衡量该方法准确性的基本事实。在两个真实数据集上的实验验证了基于深度强化学习的方法的有效性和效率。 摘要:We develop a novel framework for efficiently and effectively discovering crowdsourced services that move in close proximity to a user over a period of time. We introduce a moving crowdsourced service model which is modelled as a moving region. We propose a deep reinforcement learning-based composition approach to select and compose moving IoT services considering quality parameters. Additionally, we develop a parallel flock-based service discovery algorithm as a ground-truth to measure the accuracy of the proposed approach. The experiments on two real-world datasets verify the effectiveness and efficiency of the deep reinforcement learning-based approach.

【10】 Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning 标题:风险敏感强化学习的指数Bellman方程和改进的遗憾界 链接:https://arxiv.org/abs/2111.03947

作者:Yingjie Fei,Zhuoran Yang,Yudong Chen,Zhaoran Wang 机构: Northwestern University, Princeton University, University of Wisconsin-Madison 摘要:我们研究了基于熵风险测度的风险敏感强化学习(RL)。虽然现有的工作已经为这个问题建立了非渐近后悔保证,但它们在上界和下界之间留下了一个指数差距。我们确定了现有算法及其分析中的缺陷,这些缺陷导致了这种差距。为了弥补这些不足,我们研究了风险敏感Bellman方程的简单变换,我们称之为指数Bellman方程。指数Bellman方程激励我们对风险敏感RL算法中的Bellman备份过程进行新的分析,并进一步推动新探索机制的设计。我们表明,这些分析和算法创新一起导致改进的后悔上界。 摘要:We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.

【11】 Robust Deep Reinforcement Learning for Quadcopter Control 标题:用于四轴飞行器控制的鲁棒深度强化学习 链接:https://arxiv.org/abs/2111.03915

作者:Aditya M. Deshpande,Ali A. Minai,Manish Kumar 机构:University of Cincinnati, Clifton Ave., Cincinnati, Ohio 备注:6 pages; 3 Figures; Accepted in this https URL 摘要:深度强化学习(RL)使得使用神经网络作为函数逼近器来解决复杂的机器人问题成为可能。然而,在静态环境中训练的策略在从一个环境转移到另一个环境时会受到泛化的影响。在这项工作中,我们使用鲁棒马尔可夫决策过程(RMDP)来训练无人机控制策略,它结合了鲁棒控制和RL的思想。它选择悲观优化来处理从一个环境到另一个环境的策略转移之间的潜在差距。训练后的控制策略在四旋翼机位置控制任务上进行了测试。RL特工在MuJoCo模拟器中接受训练。在测试过程中,使用不同的环境参数(在训练过程中看不到)来验证训练策略从一个环境转移到另一个环境的鲁棒性。在这些环境中,鲁棒策略的性能优于标准代理,这表明增加的鲁棒性增加了通用性,并且可以适应非平稳环境。代码:https://github.com/adipandas/gym_multirotor 摘要:Deep reinforcement learning (RL) has made it possible to solve complex robotics problems using neural networks as function approximators. However, the policies trained on stationary environments suffer in terms of generalization when transferred from one environment to another. In this work, we use Robust Markov Decision Processes (RMDP) to train the drone control policy, which combines ideas from Robust Control and RL. It opts for pessimistic optimization to handle potential gaps between policy transfer from one environment to another. The trained control policy is tested on the task of quadcopter positional control. RL agents were trained in a MuJoCo simulator. During testing, different environment parameters (unseen during the training) were used to validate the robustness of the trained policy for transfer from one environment to another. The robust policy outperformed the standard agents in these environments, suggesting that the added robustness increases generality and can adapt to non-stationary environments. Codes: https://github.com/adipandas/gym_multirotor

【12】 d3rlpy: An Offline Deep Reinforcement Learning Library 标题:d3rlpy:一个离线深度强化学习库 链接:https://arxiv.org/abs/2111.03788

作者:Takuma Seno,Michita Imai 机构:Keio University, Sony AI 备注:Accepted at Offline Reinforcement Learning Workshop at Neural Information Processing Systems, 2021 摘要:在本文中,我们介绍了d3rlpy,一个用于Python的开源离线深度强化学习(RL)库。d3rlpy通过用户友好的API支持许多离线深度RL算法以及在线算法。为了协助深度RL研究和开发项目,d3rlpy提供了实用且独特的功能,如数据收集、导出部署策略、预处理和后处理、分布式Q函数、多步骤学习和方便的命令行界面。此外,d3rlpy还提供了一种新颖的图形界面,使用户能够在不编写程序的情况下训练离线RL算法。最后,使用D4RL数据集对实现的算法进行了基准测试,以确保实现质量。d3rlpy源代码可以在GitHub:\url上找到{https://github.com/takuseno/d3rlpy}. 摘要:In this paper, we introduce d3rlpy, an open-sourced offline deep reinforcement learning (RL) library for Python. d3rlpy supports a number of offline deep RL algorithms as well as online algorithms via a user-friendly API. To assist deep RL research and development projects, d3rlpy provides practical and unique features such as data collection, exporting policies for deployment, preprocessing and postprocessing, distributional Q-functions, multi-step learning and a convenient command-line interface. Furthermore, d3rlpy additionally provides a novel graphical interface that enables users to train offline RL algorithms without coding programs. Lastly, the implemented algorithms are benchmarked with D4RL datasets to ensure the implementation quality. The d3rlpy source code can be found on GitHub: \url{https://github.com/takuseno/d3rlpy}.

元学习(3篇)

【1】 MetaMIML: Meta Multi-Instance Multi-Label Learning 标题:MetaMIML:元多实例多标签学习 链接:https://arxiv.org/abs/2111.04112

作者:Yuanlin Yang,Guoxian Yu,Jun Wang,Lei Liu,Carlotta Domeniconi,Maozu Guo 机构:College of Computer and Information Sciences, Southwest University, Chongqing, China, School of Software, Shandong University, Jinan, China, Joint SDU-NTU Centre for Artificial Intelligence Research, Shandong University, Jinan, China 备注:10 pages, 2 figures 摘要:多实例多标签学习(MIML)为复杂对象(包)建模,每个对象与一组相互关联的标签相关联,并由一组实例组成。当前的MIML解决方案仍然关注单一类型的对象,并假设训练数据是IID分布。但这些对象与其他类型的对象相链接,%(即Facebook中与不同用户链接的图片),这些对象也对目标对象的语义进行编码。此外,他们通常需要大量的标记数据进行训练。为了有效挖掘不同类型的相互依赖的MIML对象,我们提出了一种基于网络嵌入和元学习的方法(MetaMIML)。MetaMIML引入了具有网络嵌入的上下文学习器来捕获不同类型对象的语义信息,并引入了任务学习器来提取元知识以快速适应新任务。通过这种方式,MetaMIML可以自然地在数据级别处理MIML对象,但也可以在模型增强级别利用元学习的能力。在基准数据集上的实验表明,MetaMIML的性能明显优于最先进的算法。 摘要:Multi-Instance Multi-Label learning (MIML) models complex objects (bags), each of which is associated with a set of interrelated labels and composed with a set of instances. Current MIML solutions still focus on a single-type of objects and assumes an IID distribution of training data. But these objects are linked with objects of other types, %(i.e., pictures in Facebook link with various users), which also encode the semantics of target objects. In addition, they generally need abundant labeled data for training. To effectively mine interdependent MIML objects of different types, we propose a network embedding and meta learning based approach (MetaMIML). MetaMIML introduces the context learner with network embedding to capture semantic information of objects of different types, and the task learner to extract the meta knowledge for fast adapting to new tasks. In this way, MetaMIML can naturally deal with MIML objects at data level improving, but also exploit the power of meta-learning at the model enhancing. Experiments on benchmark datasets demonstrate that MetaMIML achieves a significantly better performance than state-of-the-art algorithms.

【2】 Meta Cross-Modal Hashing on Long-Tailed Data 标题:长尾数据的Meta Cross-Modal散列 链接:https://arxiv.org/abs/2111.04086

作者:Runmin Wang,Guoxian Yu,Carlotta Domeniconi,Xiangliang Zhang 机构: Shandong University, com‡Department of Computer Science, George Mason University 备注:10 pages, 4 figures 摘要:由于跨模式散列在减少存储量的同时加快了对大型异构数据的查询速度,因此在多模式数据的近似最近邻搜索中,跨模式散列得到了广泛的研究。大多数散列方法假设训练数据是类平衡的。然而,在实践中,真实世界的数据通常具有长尾分布。在本文中,我们介绍了一种基于元学习的跨模态散列方法(MetaCMH)来处理长尾数据。由于尾类中缺少训练样本,MetaCMH首先从不同模式的数据中学习直接特征,然后引入联想记忆模块来学习尾类样本的记忆特征。然后,它将直接和内存特性结合起来,以获得每个示例的元特性。对于长尾分布的头类样本,直接特征的权重较大,因为有足够的训练数据可以很好地学习它们;而对于稀有类,内存特性的权重更大。最后,MetaCMH使用似然损失函数来保持不同模式下的相似性,并以端到端的方式学习哈希函数。在长尾数据集上的实验表明,MetaCMH的性能明显优于最先进的方法,尤其是在尾类上。 摘要:Due to the advantage of reducing storage while speeding up query time on big heterogeneous data, cross-modal hashing has been extensively studied for approximate nearest neighbor search of multi-modal data. Most hashing methods assume that training data is class-balanced.However, in practice, real world data often have a long-tailed distribution. In this paper, we introduce a meta-learning based cross-modal hashing method (MetaCMH) to handle long-tailed data. Due to the lack of training samples in the tail classes, MetaCMH first learns direct features from data in different modalities, and then introduces an associative memory module to learn the memory features of samples of the tail classes. It then combines the direct and memory features to obtain meta features for each sample. For samples of the head classes of the long tail distribution, the weight of the direct features is larger, because there are enough training data to learn them well; while for rare classes, the weight of the memory features is larger. Finally, MetaCMH uses a likelihood loss function to preserve the similarity in different modalities and learns hash functions in an end-to-end fashion. Experiments on long-tailed datasets show that MetaCMH performs significantly better than state-of-the-art methods, especially on the tail classes.

【3】 Crowdsourcing with Meta-Workers: A New Way to Save the Budget 标题:元员工众包:节约预算的新途径 链接:https://arxiv.org/abs/2111.04068

作者:Guangyang Han,Guoxian Yu,Lizhen Cui,Carlotta Domeniconi,Xiangliang Zhang 备注:11 pages, 6 figures 摘要:由于互联网工作者的不可靠性,很难令人满意地完成众包项目,特别是在任务多且预算有限的情况下。近年来,元学习为Few-Shot学习带来了新的活力,使得仅使用少量训练样本就可以获得性能良好的分类器成为可能。这里我们介绍\emph{meta worker}的概念,这是一种通过元学习训练的机器注释器,用于非常适合人工智能的任务类型(即图像分类)。与普通人群工作者不同,元工作者可以是可靠的、稳定的,更重要的是,不知疲倦的、自由的。我们首先对未标记的数据进行聚类,并要求群组工作人员重复标注聚类中心附近的实例;然后,我们利用带注释的数据和元训练数据集,使用不同的元学习算法构建元工作者集群。随后,元工作者被要求对剩余的众包任务进行注释。Jensen-Shannon分歧用于衡量元工作者提供的注释之间的分歧,这决定是否应邀请群组工作者对同一任务进行进一步注释。最后,我们对元工作者的偏好进行建模,并通过加权多数投票计算共识注释。我们的实证研究证实,通过结合机器和人类智能,我们可以以比最先进的任务分配方法更低的预算完成众包项目,同时实现更高或可比的质量。 摘要:Due to the unreliability of Internet workers, it's difficult to complete a crowdsourcing project satisfactorily, especially when the tasks are multiple and the budget is limited. Recently, meta learning has brought new vitality to few-shot learning, making it possible to obtain a classifier with a fair performance using only a few training samples. Here we introduce the concept of \emph{meta-worker}, a machine annotator trained by meta learning for types of tasks (i.e., image classification) that are well-fit for AI. Unlike regular crowd workers, meta-workers can be reliable, stable, and more importantly, tireless and free. We first cluster unlabeled data and ask crowd workers to repeatedly annotate the instances nearby the cluster centers; we then leverage the annotated data and meta-training datasets to build a cluster of meta-workers using different meta learning algorithms. Subsequently, meta-workers are asked to annotate the remaining crowdsourced tasks. The Jensen-Shannon divergence is used to measure the disagreement among the annotations provided by the meta-workers, which determines whether or not crowd workers should be invited for further annotation of the same task. Finally, we model meta-workers' preferences and compute the consensus annotation by weighted majority voting. Our empirical study confirms that, by combining machine and human intelligence, we can accomplish a crowdsourcing project with a lower budget than state-of-the-art task assignment methods, while achieving a superior or comparable quality.

医学相关(5篇)

【1】 The Invisible COVID-19 Crisis: Post-Traumatic Stress Disorder Risk Among Frontline Physicians Treating COVID-19 Patients 标题:看不见的冠状病毒危机:治疗冠状病毒患者的一线医生中的创伤后应激障碍风险 链接:https://arxiv.org/abs/2111.04441

作者:Sayanti Mukherjee,Lance Rintamaki,Janet L. Shucard,Zhiyuan Wei,Lindsey E. Carlasare,Christine A. Sinsky 机构:Department of Industrial and Systems Engineering, School of Engineering and Applied Sciences, University at Buffalo – The State University of New York, Buffalo NY, Department of Communication, College of Arts and Sciences 备注:34 pages, 5 Tables, 2 Figues, Under review with Journal of Psychiatric Research 摘要:本研究评估了美国一线医生(治疗新冠病毒-19患者)与二线医生(不治疗新冠病毒-19患者)之间的创伤后应激障碍(PTSD),并确定了与较高PTSD风险相关的因素的意义和模式。在2020年8月和9月期间,对18个新冠肺炎病例最多的州的执业医师进行了一项基于网络的横断面调查。在1478名作出回应的医生中,1017名完成了PTSD检查表(PCL-5)。首先,PCL-5用于比较两个医师组之间的症状认可。一线医生比二线医生对PCL-5症状和PCL-5评分有临床显著认可的比例更高。其次,通过分析变量重要性和部分依赖图,利用逻辑回归和七种非线性机器学习(ML)算法识别PTSD风险的潜在预测因子。PTSD风险的预测因素包括认知/心理测量、职业特征、工作经历、社会支持、人口统计学和工作场所特征。重要的是,最终的ML模型随机森林确定了一线医生PTSD风险的破坏性和保护性预测因子的模式。关键的损害因素包括抑郁、倦怠、消极应对、对感染/传播新冠病毒的恐惧、感知到的耻辱感以及治疗新冠病毒患者的资源不足。保护因素包括弹性和来自雇主/朋友/家人/重要他人的支持。这项研究强调了ML算法在揭示一线医生PTSD保护性/破坏性风险因素之间的非线性关系方面的价值,这可能更好地为干预措施提供信息,为未来的流行病/大流行准备医疗系统。 摘要:This study evaluated post traumatic stress disorder (PTSD) among frontline US physicians (treating COVID-19 patients) in comparison with second-line physicians (not treating COVID-19 patients), and identified the significance and patterns of factors associated with higher PTSD risk. A cross-sectional, web-based survey was deployed during August and September, 2020, to practicing physicians in the 18 states with the largest COVID-19 cases. Among 1,478 responding physicians, 1,017 completed the PTSD Checklist (PCL-5). First, the PCL-5 was used to compare symptom endorsement between the two physician groups. A greater percentage of frontline than second-line physicians had clinically significant endorsement of PCL-5 symptoms and higher PCL-5 scores. Second, logistic regression and seven nonlinear machine learning (ML) algorithms were leveraged to identify potential predictors of PTSD risk by analyzing variable importance and partial dependence plots. Predictors of PTSD risk included cognitive/psychological measures, occupational characteristics, work experiences, social support, demographics, and workplace characteristics. Importantly, the final ML model random forest, identified patterns of both damaging and protective predictors of PTSD risk among frontline physicians. Key damaging factors included depression, burnout, negative coping, fears of contracting/transmitting COVID-19, perceived stigma, and insufficient resources to treat COVID-19 patients. Protective factors included resilience and support from employers/friends/family/significant others. This study underscores the value of ML algorithms to uncover nonlinear relationships among protective/damaging risk factors for PTSD in frontline physicians, which may better inform interventions to prepare healthcare systems for future epidemics/pandemics.

【2】 Spirometry-based airways disease simulation and recognition using Machine Learning approaches 标题:基于呼吸机学习的气道疾病模拟与识别 链接:https://arxiv.org/abs/2111.04315

作者:Riccardo Dio,André Galligo,Angelos Mantzaflaris,Benjamin Mauroy 机构:Mauroy, Universit´e Cˆote d’Azur, Inria, France, Universit´e Cˆote d’Azur, CNRS, LJAD, VADER Center, France 备注:None 摘要:本研究的目的是为医生提供自动快速识别呼吸道疾病的方法。在这项工作中,我们主要关注可以使用肺活量计轻松记录的测量。该框架中使用的信号使用肺的线性双室模型进行模拟。这使我们能够在休息时通风(潮汐呼吸)的假设下模拟通风。通过改变电阻和弹性参数,实现了模拟健康、纤维化和哮喘呼吸的数据样本。在此合成数据上,对不同的机器学习模型进行了测试,并对其性能进行了评估。除朴素偏差分类器外,所有分类器的准确率至少为99%。这证明了机器学习可以根据人工肺活量测量数据准确区分疾病的概念。这为该主题的进一步发展铺平了道路,特别是在真实数据上测试模型。 摘要:The purpose of this study is to provide means to physicians for automated and fast recognition of airways diseases. In this work, we mainly focus on measures that can be easily recorded using a spirometer. The signals used in this framework are simulated using the linear bi-compartment model of the lungs. This allows us to simulate ventilation under the hypothesis of ventilation at rest (tidal breathing). By changing the resistive and elastic parameters, data samples are realized simulating healthy, fibrosis and asthma breathing. On this synthetic data, different machine learning models are tested and their performance is assessed. All but the Naive bias classifier show accuracy of at least 99%. This represents a proof of concept that Machine Learning can accurately differentiate diseases based on manufactured spirometry data. This paves the way for further developments on the topic, notably testing the model on real data.

【3】 Deep Learning Based Model for Breast Cancer Subtype Classification 标题:基于深度学习的乳腺癌亚型分类模型 链接:https://arxiv.org/abs/2111.03923

作者:Sheetal Rajpal,Virendra Kumar,Manoj Agarwal,Naveen Kumar 机构:Department of Computer Science, University of Delhi, India, Department of Nuclear Magnetic Resonance Imaging, All India Institute of Medical Sciences, New Delhi, India, Department of Computer Science, Hans Raj College, University of Delhi, Delhi, India 备注:Paper have been accepted for publication in ICACET 2021 摘要:乳腺癌长期以来一直是女性死亡的主要原因。由于能够记录基因表达数据的RNA测序工具的可用性,诊断、治疗和预后现在成为可能。分子亚型与制定临床策略和预后密切相关,本文着重于利用基因表达数据将乳腺癌分为四个亚型,即基底型、Her2型、LumA型和LumB型。在第一阶段,我们提出了一个基于深度学习的模型,该模型使用自动编码器来降低维度。通过使用自动编码器,特征集的大小从20530个基因表达值减少到500个。这种编码表示被传递到第二阶段的深层神经网络,用于将患者分为四个乳腺癌分子亚型。通过部署阶段1和阶段2的组合网络,我们已经能够在TCGA乳腺癌数据集上实现平均10倍的测试精度0.907。如分类精度的箱线图所示,建议的框架在10次不同的运行中都相当稳健。与文献报道的相关工作相比,我们取得了具有竞争力的结果。总之,提出的基于两阶段深度学习的模型能够准确地对四种乳腺癌亚型进行分类,突出了自动编码器推导紧凑表示的能力和神经网络分类器正确标记乳腺癌患者的能力。 摘要:Breast cancer has long been a prominent cause of mortality among women. Diagnosis, therapy, and prognosis are now possible, thanks to the availability of RNA sequencing tools capable of recording gene expression data. Molecular subtyping being closely related to devising clinical strategy and prognosis, this paper focuses on the use of gene expression data for the classification of breast cancer into four subtypes, namely, Basal, Her2, LumA, and LumB. In stage 1, we suggested a deep learning-based model that uses an autoencoder to reduce dimensionality. The size of the feature set is reduced from 20,530 gene expression values to 500 by using an autoencoder. This encoded representation is passed to the deep neural network of the second stage for the classification of patients into four molecular subtypes of breast cancer. By deploying the combined network of stages 1 and 2, we have been able to attain a mean 10-fold test accuracy of 0.907 on the TCGA breast cancer dataset. The proposed framework is fairly robust throughout 10 different runs, as shown by the boxplot for classification accuracy. Compared to related work reported in the literature, we have achieved a competitive outcome. In conclusion, the proposed two-stage deep learning-based model is able to accurately classify four breast cancer subtypes, highlighting the autoencoder's capacity to deduce the compact representation and the neural network classifier's ability to correctly label breast cancer patients.

【4】 Acquisition-invariant brain MRI segmentation with informative uncertainties 标题:具有信息不确定性的获取不变脑MRI分割 链接:https://arxiv.org/abs/2111.04094

作者:Pedro Borges,Richard Shaw,Thomas Varsavsky,Kerstin Klaser,David Thomas,Ivana Drobnjak,Sebastien Ourselin,M Jorge Cardoso 机构:Department of Medical Physics and Biomedical Engineering, UCL, UK, School of Biomedical Engineering and Imaging Sciences, KCL, UK, Dementia Research Centre, UCL, UK 备注:25 pages, 8 figures 摘要:结合多站点数据可以加强和揭示趋势,但由于站点特定协变量的影响,这项任务受到影响,可能会使数据和任何下游分析产生偏差。存在事后多站点校正方法,但其假设性强,在现实场景中往往不成立。算法的设计应考虑特定于现场的影响,如序列参数选择产生的影响,并且在推广失败的情况下,应能够通过显式不确定性建模识别此类失败。这一系列工作展示了这样一种算法,它可以在分割任务的背景下对采集物理变得鲁棒,同时对不确定性进行建模。我们证明,我们的方法不仅可以推广到完整的保留数据集,保持分割质量,而且还可以在考虑站点特定序列选择的同时实现这一点,这也允许它作为协调工具发挥作用。 摘要:Combining multi-site data can strengthen and uncover trends, but is a task that is marred by the influence of site-specific covariates that can bias the data and therefore any downstream analyses. Post-hoc multi-site correction methods exist but have strong assumptions that often do not hold in real-world scenarios. Algorithms should be designed in a way that can account for site-specific effects, such as those that arise from sequence parameter choices, and in instances where generalisation fails, should be able to identify such a failure by means of explicit uncertainty modelling. This body of work showcases such an algorithm, that can become robust to the physics of acquisition in the context of segmentation tasks, while simultaneously modelling uncertainty. We demonstrate that our method not only generalises to complete holdout datasets, preserving segmentation quality, but does so while also accounting for site-specific sequence choices, which also allows it to perform as a harmonisation tool.

【5】 Learn-Morph-Infer: a new way of solving the inverse problem for brain tumor modeling 标题:学习-变形-推断:一种求解脑瘤模型逆问题的新方法 链接:https://arxiv.org/abs/2111.04090

作者:Ivan Ezhov,Kevin Scibilia,Katharina Franitza,Felix Steinbauer,Suprosanna Shit,Lucas Zimmer,Jana Lipkova,Florian Kofler,Johannes Paetzold,Luca Canalini,Diana Waldmannstetter,Martin Menten,Marie Metz,Benedikt Wiestler,Bjoern Menze 机构:Department of Informatics, TUM, Munich, Germany, TranslaTUM - Central Institute for Translational Cancer Research, TUM, Munich, Germany, Department of Mechanical Engineering, TUM, Munich, Germany 摘要:通过获取肿瘤细胞浓度的空间分布,当前诊断为脑肿瘤患者的治疗计划可以显著受益。现有的诊断方法,如磁共振成像(MRI),可以很好地对比高细胞密度区域。然而,它们并不描绘低浓度区域,低浓度区域通常可以作为治疗后肿瘤再次出现的来源。肿瘤生长的数值模拟可以通过提供肿瘤细胞全空间分布的估计来补充成像信息。近年来,一个关于基于医学图像的肿瘤建模的文献语料库被发表。它包括描述正向肿瘤生长模型的不同数学形式。此外,还开发了各种参数推理方案,以实现有效的肿瘤模型个性化,即求解反问题。然而,所有现有方法的统一缺点是模型个性化的时间复杂性,这阻碍了模型与临床环境的潜在集成。在这项工作中,我们介绍了一种从T1Gd和FLAIR MRI医学扫描推断脑肿瘤患者特异性空间分布的方法。该方法被称为\text{Learn Morph expert},在广泛可用的硬件上实现了几分钟的实时性能,并且在不同复杂度的肿瘤模型(如反应扩散和反应平流扩散模型)中,计算时间是稳定的。我们相信,所提出的逆解方法不仅为脑肿瘤个性化的临床翻译搭建了桥梁,而且还可以应用于其他科学和工程领域。 摘要:Current treatment planning of patients diagnosed with brain tumor could significantly benefit by accessing the spatial distribution of tumor cell concentration. Existing diagnostic modalities, such as magnetic-resonance imaging (MRI), contrast sufficiently well areas of high cell density. However, they do not portray areas of low concentration, which can often serve as a source for the secondary appearance of the tumor after treatment. Numerical simulations of tumor growth could complement imaging information by providing estimates of full spatial distributions of tumor cells. Over recent years a corpus of literature on medical image-based tumor modeling was published. It includes different mathematical formalisms describing the forward tumor growth model. Alongside, various parametric inference schemes were developed to perform an efficient tumor model personalization, i.e. solving the inverse problem. However, the unifying drawback of all existing approaches is the time complexity of the model personalization that prohibits a potential integration of the modeling into clinical settings. In this work, we introduce a methodology for inferring patient-specific spatial distribution of brain tumor from T1Gd and FLAIR MRI medical scans. Coined as \textit{Learn-Morph-Infer} the method achieves real-time performance in the order of minutes on widely available hardware and the compute time is stable across tumor models of different complexity, such as reaction-diffusion and reaction-advection-diffusion models. We believe the proposed inverse solution approach not only bridges the way for clinical translation of brain tumor personalization but can also be adopted to other scientific and engineering domains.

蒸馏|知识提取(2篇)

【1】 Oracle Teacher: Towards Better Knowledge Distillation 标题:甲骨文老师:迈向更好的知识蒸馏 链接:https://arxiv.org/abs/2111.03664

作者:Ji Won Yoon,Hyung Yong Kim,Hyeonseung Lee,Sunghwan Ahn,Nam Soo Kim 备注:This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要:知识提取(KD)是一种有效的模型压缩方法,旨在将较大网络(教师)的知识转移到较小网络(学生)中。传统的KD方法通常采用有监督的教师模型,其中输出标签仅被视为目标。进一步扩展这种有监督的方案,我们引入了一种新的KD教师模型,即Oracle教师模型,该模型利用源输入和输出标签的嵌入来提取更准确的知识并传递给学生。该模型遵循Transformer网络的编解码器注意结构,允许模型关注输出标签中的相关信息。在三种不同的序列学习任务上进行了广泛的实验:语音识别、场景文本识别和机器翻译。从实验结果来看,我们的经验表明,该模型提高了学生完成这些任务的能力,同时大大加快了教师模型的训练时间。 摘要:Knowledge distillation (KD), best known as an effective method for model compression, aims at transferring the knowledge of a bigger network (teacher) to a much smaller network (student). Conventional KD methods usually employ the teacher model trained in a supervised manner, where output labels are treated only as targets. Extending this supervised scheme further, we introduce a new type of teacher model for KD, namely Oracle Teacher, that utilizes the embeddings of both the source inputs and the output labels to extract a more accurate knowledge to be transferred to the student. The proposed model follows the encoder-decoder attention structure of the Transformer network, which allows the model to attend to related information from the output labels. Extensive experiments are conducted on three different sequence learning tasks: speech recognition, scene text recognition, and machine translation. From the experimental results, we empirically show that the proposed model improves the students across these tasks while achieving a considerable speed-up in the teacher model's training time.

【2】 Class Token and Knowledge Distillation for Multi-head Self-Attention Speaker Verification Systems 标题:多头自关注说话人确认系统的类表征与知识提取 链接:https://arxiv.org/abs/2111.03842

作者:Victoria Mingote,Antonio Miguel,Alfonso Ortega,Eduardo Lleida 机构: Arag´onInstitute for Engineering Research (I 3A), University of Zaragoza 摘要:本文探讨了三种新的方法来提高说话人确认(SV)系统的性能基于深度神经网络(DNN)使用多头自注意(MSA)机制和记忆层。首先,我们提出使用一种称为类令牌的可学习向量来代替平均全局池机制来提取嵌入。与全局平均池不同,我们的建议考虑了与文本相关SV任务相关的输入的时间结构。类标记连接到第一个MSA层之前的输入,其在输出处的状态用于预测类。为了获得额外的鲁棒性,我们引入了两种方法。首先,我们开发了类标记的贝叶斯估计。其次,我们添加了一个蒸馏表示令牌,用于使用知识蒸馏(KD)原理训练师生对网络,该原理与类令牌相结合。该蒸馏标记经过训练以模仿来自教师网络的预测,而类标记复制真实标签。所有策略都在RSR2015 Part II和DeepMine Part 1数据库上针对文本相关SV进行了测试,与使用平均池机制提取平均嵌入的相同体系结构相比,提供了具有竞争力的结果。 摘要:This paper explores three novel approaches to improve the performance of speaker verification (SV) systems based on deep neural networks (DNN) using Multi-head Self-Attention (MSA) mechanisms and memory layers. Firstly, we propose the use of a learnable vector called Class token to replace the average global pooling mechanism to extract the embeddings. Unlike global average pooling, our proposal takes into account the temporal structure of the input what is relevant for the text-dependent SV task. The class token is concatenated to the input before the first MSA layer, and its state at the output is used to predict the classes. To gain additional robustness, we introduce two approaches. First, we have developed a Bayesian estimation of the class token. Second, we have added a distilled representation token for training a teacher-student pair of networks using the Knowledge Distillation (KD) philosophy, which is combined with the class token. This distillation token is trained to mimic the predictions from the teacher network, while the class token replicates the true label. All the strategies have been tested on the RSR2015-Part II and DeepMine-Part 1 databases for text-dependent SV, providing competitive results compared to the same architecture using the average pooling mechanism to extract average embeddings.

聚类(1篇)

【1】 Clustering and Structural Robustness in Causal Diagrams 标题:因果图中的聚类和结构稳健性 链接:https://arxiv.org/abs/2111.04513

作者:Santtu Tikka,Jouni Helske,Juha Karvanen 摘要:图形通常用于表示和可视化因果关系。对于少量变量,此方法提供了手头场景的简洁明了的视图。随着研究变量数量的增加,图解法可能变得不切实际,表达的清晰度也会降低。变量聚类是减少因果关系图大小的一种自然方法,但如果任意实施,它可能会错误地改变因果关系的基本属性。我们定义了一种特殊类型的聚类,称为运输聚类,它保证在一定条件下保持因果效应的可识别性。我们提供了一个完善的算法,用于在给定的图中查找所有公交集群,并演示了集群如何简化因果效应的识别。我们还研究了反问题,其中一个从聚集图开始,寻找因果效应的可识别性保持不变的扩展图。我们证明了这种结构鲁棒性与公交集群密切相关。 摘要:Graphs are commonly used to represent and visualize causal relations. For a small number of variables, this approach provides a succinct and clear view of the scenario at hand. As the number of variables under study increases, the graphical approach may become impractical, and the clarity of the representation is lost. Clustering of variables is a natural way to reduce the size of the causal diagram but it may erroneously change the essential properties of the causal relations if implemented arbitrarily. We define a specific type of cluster, called transit cluster, that is guaranteed to preserve the identifiability properties of causal effects under certain conditions. We provide a sound and complete algorithm for finding all transit clusters in a given graph and demonstrate how clustering can simplify the identification of causal effects. We also study the inverse problem, where one starts with a clustered graph and looks for extended graphs where the identifiability properties of causal effects remain unchanged. We show that this kind of structural robustness is closely related to transit clusters.

超分辨率|去噪|去模糊|去雾(1篇)

【1】 Estimating High Order Gradients of the Data Distribution by Denoising 标题:用去噪方法估计数据分布的高阶梯度 链接:https://arxiv.org/abs/2111.04726

作者:Chenlin Meng,Yang Song,Wenzhe Li,Stefano Ermon 机构:Stanford University, Tsinghua University 备注:NeurIPS 2021 摘要:通过去噪分数匹配可以有效地估计数据密度的一阶导数,它已成为图像生成和音频合成等许多应用中的重要组成部分。高阶导数提供有关数据分布的附加本地信息,并支持新的应用程序。虽然可以通过学习密度模型的自动微分来估计它们,但这会放大估计误差,并且在高维环境中成本高昂。为了克服这些局限性,我们提出了一种从样本中直接估计数据密度高阶导数(分数)的方法。我们首先表明去噪分数匹配可以解释为特维迪公式的一个特殊情况。通过利用高阶矩上的Tweedie公式,我们将去噪分数匹配推广到估计高阶导数。我们的经验证明,使用该方法训练的模型比通过自动微分更有效、更准确地逼近二阶导数。我们表明,我们的模型可用于量化去噪过程中的不确定性,并通过对合成数据和自然图像进行Ozaki离散化,提高Langevin dynamics的混合速度。 摘要:The first order derivative of a data density can be estimated efficiently by denoising score matching, and has become an important component in many applications, such as image generation and audio synthesis. Higher order derivatives provide additional local information about the data distribution and enable new applications. Although they can be estimated via automatic differentiation of a learned density model, this can amplify estimation errors and is expensive in high dimensional settings. To overcome these limitations, we propose a method to directly estimate high order derivatives (scores) of a data density from samples. We first show that denoising score matching can be interpreted as a particular case of Tweedie's formula. By leveraging Tweedie's formula on higher order moments, we generalize denoising score matching to estimate higher order derivatives. We demonstrate empirically that models trained with the proposed method can approximate second order derivatives more efficiently and accurately than via automatic differentiation. We show that our models can be used to quantify uncertainty in denoising and to improve the mixing speed of Langevin dynamics via Ozaki discretization for sampling synthetic data and natural images.

联邦学习|隐私保护|加密(1篇)

【1】 Federated Learning Based on Dynamic Regularization 标题:基于动态正则化的联邦学习 链接:https://arxiv.org/abs/2111.04263

作者:Durmus Alp Emre Acar,Yue Zhao,Ramon Matas Navarro,Matthew Mattina,Paul N. Whatmough,Venkatesh Saligrama 机构:Boston University 备注:Slightly extended version of ICLR 2021 Paper 摘要:我们提出了一种新的分布式训练神经网络模型的联邦学习方法,其中服务器在每一轮中协调随机选择的设备子集之间的协作。我们主要从通信的角度来看待联合学习问题,并允许更多的设备级计算来节省传输成本。我们指出了一个基本的困境,即局部器件级经验损耗的最小值与全局经验损耗的最小值不一致。与最近的工作不同,无论是尝试不精确最小化还是利用设备进行并行梯度计算,我们在每轮中为每个设备提出了一个动态正则化器,以便在极限情况下,全局解和设备解对齐。我们通过对真实和合成数据的实证结果以及分析结果证明,我们的方案在凸和非凸环境下都能实现有效的训练,同时对设备异质性完全不可知,对大量设备、部分参与和不平衡数据具有鲁棒性。 摘要:We propose a novel federated learning method for distributively training neural network models, where the server orchestrates cooperation between a subset of randomly chosen devices in each round. We view Federated Learning problem primarily from a communication perspective and allow more device level computations to save transmission costs. We point out a fundamental dilemma, in that the minima of the local-device level empirical loss are inconsistent with those of the global empirical loss. Different from recent prior works, that either attempt inexact minimization or utilize devices for parallelizing gradient computation, we propose a dynamic regularizer for each device at each round, so that in the limit the global and device solutions are aligned. We demonstrate both through empirical results on real and synthetic data as well as analytical results that our scheme leads to efficient training, in both convex and non-convex settings, while being fully agnostic to device heterogeneity and robust to large number of devices, partial participation and unbalanced data.

推理|分析|理解|解释(10篇)

【1】 HAPSSA: Holistic Approach to PDF Malware Detection Using Signal and Statistical Analysis 标题:HAPSSA:基于信号和统计分析的PDF恶意软件整体检测方法 链接:https://arxiv.org/abs/2111.04703

作者:Tajuddin Manhar Mohammed,Lakshmanan Nataraj,Satish Chikkagoudar,Shivkumar Chandrasekaran,B. S. Manjunath 备注:Submitted version - MILCOM 2021 IEEE Military Communications Conference 摘要:恶意PDF文档对各种安全组织构成严重威胁,这些组织需要现代威胁情报平台来有效分析和描述PDF恶意软件的身份和行为。最先进的方法使用机器学习(ML)来学习PDF恶意软件的特征。然而,ML模型通常容易受到规避攻击,在这种攻击中,对手混淆恶意软件代码以避免被防病毒检测到。在本文中,我们推导了一种简单而有效的PDF恶意软件检测整体方法,该方法利用恶意软件二进制文件的信号和统计分析。这包括将来自各种静态和动态恶意软件检测方法的正交特征空间模型结合起来,以便在遇到代码混淆时具有通用的鲁棒性。使用包含恶意软件和良性样本的近30000个PDF文件的数据集,我们表明,我们的整体方法保持了PDF恶意软件的高检测率(99.92%),甚至可以检测通过简单方法创建的新恶意文件,这些方法消除了恶意软件作者为隐藏其恶意软件而进行的混淆,大多数抗病毒药物都没有发现。 摘要:Malicious PDF documents present a serious threat to various security organizations that require modern threat intelligence platforms to effectively analyze and characterize the identity and behavior of PDF malware. State-of-the-art approaches use machine learning (ML) to learn features that characterize PDF malware. However, ML models are often susceptible to evasion attacks, in which an adversary obfuscates the malware code to avoid being detected by an Antivirus. In this paper, we derive a simple yet effective holistic approach to PDF malware detection that leverages signal and statistical analysis of malware binaries. This includes combining orthogonal feature space models from various static and dynamic malware detection methods to enable generalized robustness when faced with code obfuscations. Using a dataset of nearly 30,000 PDF files containing both malware and benign samples, we show that our holistic approach maintains a high detection rate (99.92%) of PDF malware and even detects new malicious files created by simple methods that remove the obfuscation conducted by malware authors to hide their malware, which are undetected by most antiviruses.

【2】 The Hardness Analysis of Thompson Sampling for Combinatorial Semi-bandits with Greedy Oracle 标题:贪婪甲骨文组合半强盗的Thompson抽样难度分析 链接:https://arxiv.org/abs/2111.04295

作者:Fang Kong,Yueran Yang,Wei Chen,Shuai Li 机构:Shanghai Jiao Tong University, Microsoft Research 备注:Accepted in NeurIPS, 2021 摘要:汤普森采样(TS)在土匪地区引起了很多兴趣。它是在20世纪30年代引入的,但直到最近几年才在理论上得到证实。它在组合多臂bandit(CMAB)环境中的所有分析都需要一个精确的oracle来提供任何输入的最佳解决方案。然而,这样的预言通常是不可行的,因为许多组合优化问题是NP难的,并且只有近似预言可用。一个例子(Wang和Chen,2018)表明TS无法使用近似oracle进行学习。然而,这种oracle并不常见,它只针对特定的问题实例而设计。TS的收敛性分析是否可以扩展到CMAB中的精确oracle,这仍然是一个悬而未决的问题。在本文中,我们在贪心预言机下研究了这个问题,贪心预言机是一个常用的(近似)预言机,具有解决许多(离线)组合优化问题的理论保证。我们提供了一个与问题相关的遗憾下界$\Omega(\log T/\Delta^2)$来量化TS的硬度,以解决贪婪oracle的CMAB问题,其中,$T$是时间范围,$\Delta$是一些奖励差距。我们还提供了几乎匹配的遗憾上限。这是TS第一次用通用近似预言解CMAB的理论结果,打破了TS不能用近似预言解的误解。 摘要:Thompson sampling (TS) has attracted a lot of interest in the bandit area. It was introduced in the 1930s but has not been theoretically proven until recent years. All of its analysis in the combinatorial multi-armed bandit (CMAB) setting requires an exact oracle to provide optimal solutions with any input. However, such an oracle is usually not feasible since many combinatorial optimization problems are NP-hard and only approximation oracles are available. An example (Wang and Chen, 2018) has shown the failure of TS to learn with an approximation oracle. However, this oracle is uncommon and is designed only for a specific problem instance. It is still an open question whether the convergence analysis of TS can be extended beyond the exact oracle in CMAB. In this paper, we study this question under the greedy oracle, which is a common (approximation) oracle with theoretical guarantees to solve many (offline) combinatorial optimization problems. We provide a problem-dependent regret lower bound of order $\Omega(\log T/\Delta^2)$ to quantify the hardness of TS to solve CMAB problems with greedy oracle, where $T$ is the time horizon and $\Delta$ is some reward gap. We also provide an almost matching regret upper bound. These are the first theoretical results for TS to solve CMAB with a common approximation oracle and break the misconception that TS cannot work with approximation oracles.

【3】 Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis 标题:看看这个差异!基于SOBOL灵敏度分析的有效黑盒解释 链接:https://arxiv.org/abs/2111.04138

作者:Thomas Fel,Remi Cadene,Mathieu Chalvidal,Matthieu Cord,David Vigouroux,Thomas Serre 机构:Rémi Cadène, ∗†, Carney Institute for Brain Science, Brown University, USA, Sorbonne Université, CNRS, France, Artificial and Natural Intelligence Toulouse Institute, Université de Toulouse, France, Institut de Recherche Technologique Saint-Exupery, France 备注:NeurIPS2021 摘要:我们描述了一种基于敏感性分析并使用Sobol指数的新归因方法。除了对图像区域的单个贡献进行建模外,Sobol指数还提供了一种有效的方法,通过方差透镜捕捉图像区域之间的高阶交互作用及其对神经网络预测的贡献。我们描述了一种方法,通过使用微扰模板和有效的估计器来处理图像的高维性,使得这些指数的计算对于高维问题是有效的。重要的是,我们证明了所提出的方法在视觉(和语言模型)的标准基准上取得了良好的分数,同时与其他黑盒方法相比大大减少了计算时间——甚至超过了需要访问内部表示的最新白盒方法的精度。我们的代码免费提供:https://github.com/fel-thomas/Sobol-Attribution-Method 摘要:We describe a novel attribution method which is grounded in Sensitivity Analysis and uses Sobol indices. Beyond modeling the individual contributions of image regions, Sobol indices provide an efficient way to capture higher-order interactions between image regions and their contributions to a neural network's prediction through the lens of variance. We describe an approach that makes the computation of these indices efficient for high-dimensional problems by using perturbation masks coupled with efficient estimators to handle the high dimensionality of images. Importantly, we show that the proposed method leads to favorable scores on standard benchmarks for vision (and language models) while drastically reducing the computing time compared to other black-box methods -- even surpassing the accuracy of state-of-the-art white-box methods which require access to internal representations. Our code is freely available: https://github.com/fel-thomas/Sobol-Attribution-Method

【4】 A-PixelHop: A Green, Robust and Explainable Fake-Image Detector 标题:A-PixelHop:一种绿色、健壮、可解释的假图像检测器 链接:https://arxiv.org/abs/2111.04012

作者:Yao Zhu,Xinyu Wang,Hong-Shuo Chen,Ronald Salloum,C. -C. Jay Kuo 机构:Ming Hsieh Department of Electrical Engineering, University of Southern California, School of Computer Science and Engineering, California State University, San Bernardino 摘要:本文提出了一种新的CNN图像检测方法,称为注意像素跳(或A-PixelHop)。它有三个优点:1)较低的计算复杂度和较小的模型尺寸,2)针对各种生成模型的高检测性能,以及3)数学透明性。A-PixelHop的设计假设是很难在局部区域合成高质量的高频分量。它包含四个构建模块:1)选择包含重要高频成分的边缘/纹理块,2)对其应用多个滤波器组以获得丰富的空间光谱响应集作为特征,3)将特征输入多个二值分类器以获得一组软决策,4)制定有效的集成方案,将软决策融合到最终决策中。实验结果表明,A-PixelHop在检测CycleGAN生成的图像方面优于最先进的方法。此外,它可以很好地推广到看不见的生成模型和数据集。 摘要:A novel method for detecting CNN-generated images, called Attentive PixelHop (or A-PixelHop), is proposed in this work. It has three advantages: 1) low computational complexity and a small model size, 2) high detection performance against a wide range of generative models, and 3) mathematical transparency. A-PixelHop is designed under the assumption that it is difficult to synthesize high-quality, high-frequency components in local regions. It contains four building modules: 1) selecting edge/texture blocks that contain significant high-frequency components, 2) applying multiple filter banks to them to obtain rich sets of spatial-spectral responses as features, 3) feeding features to multiple binary classifiers to obtain a set of soft decisions, 4) developing an effective ensemble scheme to fuse the soft decisions into the final decision. Experimental results show that A-PixelHop outperforms state-of-the-art methods in detecting CycleGAN-generated images. Furthermore, it can generalize well to unseen generative models and datasets.

【5】 Understanding Layer-wise Contributions in Deep Neural Networks through Spectral Analysis 标题:用谱分析理解深度神经网络中的分层贡献 链接:https://arxiv.org/abs/2111.03972

作者:Yatin Dandi,Arthur Jacot 机构:IIT Kanpur, India, Ecole Polytechnique Fédérale de Lausanne, Switzerland, Chair of Statistical Field Theory 摘要:光谱分析是一种强大的工具,它可以将任何功能分解为更简单的部分。在机器学习中,Mercer定理推广了这一思想,为任何内核和输入分布提供了频率递增函数的自然基础。最近,有几项工作通过神经切线核的框架将这种分析扩展到深层神经网络。在这项工作中,我们分析了深层神经网络的分层谱偏差,并将其与不同层在降低给定目标函数的泛化误差方面的贡献联系起来。我们利用Hermite多项式和球谐函数的性质证明了初始层对单位球面上定义的高频函数有较大的偏向。我们进一步提供了在深神经网络的高维数据集中验证我们理论的实证结果。 摘要:Spectral analysis is a powerful tool, decomposing any function into simpler parts. In machine learning, Mercer's theorem generalizes this idea, providing for any kernel and input distribution a natural basis of functions of increasing frequency. More recently, several works have extended this analysis to deep neural networks through the framework of Neural Tangent Kernel. In this work, we analyze the layer-wise spectral bias of Deep Neural Networks and relate it to the contributions of different layers in the reduction of generalization error for a given target function. We utilize the properties of Hermite polynomials and spherical harmonics to prove that initial layers exhibit a larger bias towards high-frequency functions defined on the unit sphere. We further provide empirical results validating our theory in high dimensional datasets for Deep Neural Networks.

【6】 CloudRCA: A Root Cause Analysis Framework for Cloud Computing Platforms 标题:CloudRCA:一种面向云计算平台的根本原因分析框架 链接:https://arxiv.org/abs/2111.03753

作者:Yingying Zhang,Zhengxiong Guan,Huajie Qian,Leili Xu,Hengbo Liu,Qingsong Wen,Liang Sun,Junwei Jiang,Lunting Fan,Min Ke 机构:Alibaba Group, Hangzhou, China 备注:None 摘要:随着阿里巴巴业务在全球各行业间的扩张,构成阿里云基础设施的大数据云计算平台的服务质量和可靠性受到了更高的要求。然而,由于复杂的系统架构,这些平台中的根本原因分析非常重要。在本文中,我们提出了一个名为CloudRCA的根本原因分析框架,该框架利用异构多源数据,包括关键性能指标(KPI)、日志以及拓扑,并通过最先进的异常检测和日志分析技术提取重要特征。然后将工程特征用于知识通知的层次贝叶斯网络(KHBN)模型,以高精度和高效率地推断根本原因。烧蚀研究和综合实验比较表明,与现有框架相比,CloudRCA 1)在不同云系统中的f1分数始终优于现有方法;2) 由于KHBN的层级结构,能够处理新类型的根本原因;3) 在算法配置方面执行更稳健;4)在数据和特征尺寸方面更有利地扩展。实验还表明,可以采用跨平台的迁移学习机制,使准确率进一步提高10%以上。CloudRCA已经集成到阿里云的诊断系统中,并在MaxCompute、Realtime Compute和Hologres三个典型的云计算平台上使用。在过去12个月内,它为现场可靠性工程师(SRE)解决故障节省了超过20\%$的时间,并显著提高了服务可靠性。 摘要:As business of Alibaba expands across the world among various industries, higher standards are imposed on the service quality and reliability of big data cloud computing platforms which constitute the infrastructure of Alibaba Cloud. However, root cause analysis in these platforms is non-trivial due to the complicated system architecture. In this paper, we propose a root cause analysis framework called CloudRCA which makes use of heterogeneous multi-source data including Key Performance Indicators (KPIs), logs, as well as topology, and extracts important features via state-of-the-art anomaly detection and log analysis techniques. The engineered features are then utilized in a Knowledge-informed Hierarchical Bayesian Network (KHBN) model to infer root causes with high accuracy and efficiency. Ablation study and comprehensive experimental comparisons demonstrate that, compared to existing frameworks, CloudRCA 1) consistently outperforms existing approaches in f1-score across different cloud systems; 2) can handle novel types of root causes thanks to the hierarchical structure of KHBN; 3) performs more robustly with respect to algorithmic configurations; and 4) scales more favorably in the data and feature sizes. Experiments also show that a cross-platform transfer learning mechanism can be adopted to further improve the accuracy by more than 10\%. CloudRCA has been integrated into the diagnosis system of Alibaba Cloud and employed in three typical cloud computing platforms including MaxCompute, Realtime Compute and Hologres. It saves Site Reliability Engineers (SREs) more than $20\%$ in the time spent on resolving failures in the past twelve months and improves service reliability significantly.

【7】 Defect Detection on Semiconductor Wafers by Distribution Analysis 标题:基于分布分析的半导体晶圆缺陷检测 链接:https://arxiv.org/abs/2111.03727

作者:Thomas Olschewski 备注:40 pages, 10 figures 摘要:提出了一种基于分布分析的目标分类方法。此外,还提出了一种寻找相关特征的方法,并将该算法与另一种分类算法进行了统一。所提出的分类算法已成功地应用于近十万种不同产品类型的晶圆制造的真实测量数据。该算法更倾向于在低维搜索空间中寻找最佳评分者,而不是在高维搜索空间中寻找最佳评分者。我们的方法很有趣,因为它是快速的(准线性的),并且对于真实世界的晶圆数据达到了良好到极好的预测或检测质量。 摘要:A method for object classification that is based on distribution analysis is proposed. In addition, a method for finding relevant features and the unification of this algorithm with another classification algorithm is proposed. The presented classification algorithm has been applied successfully to real-world measurement data from wafer fabrication of close to hundred thousand chips of several product types. The presented algorithm prefers finding the best rater in a low-dimensional search space over finding a good rater in a high-dimensional search space. Our approach is interesting in that it is fast (quasi-linear) and reached good to excellent prediction or detection quality for real-world wafer data.

【8】 Leveraging Sentiment Analysis Knowledge to Solve Emotion Detection Tasks 标题:利用情感分析知识解决情感检测任务 链接:https://arxiv.org/abs/2111.03715

作者:Maude Nguyen-The,Guillaume-Alexandre Bilodeau,Jan Rockemann 机构:Polytechnique Montréal, Airudi 摘要:识别和理解文本中潜在的情感或情绪是多种自然语言处理应用程序的关键组成部分。虽然简单极性情绪分析是一个研究得很好的主题,但仅使用文本数据识别更复杂、更细粒度的情绪方面的进展较少。在本文中,我们提出了一个基于转换器的模型,该模型融合了适配器层,利用更简单的情感分析任务中的知识,仅使用文本模态改进大规模数据集(如CMU-MOSEI)上的情感检测任务。结果表明,我们提出的方法与其他方法相比具有竞争力。我们在CMU-MOSEI上获得了最先进的情感识别结果,即使只使用文本情态。 摘要:Identifying and understanding underlying sentiment or emotions in text is a key component of multiple natural language processing applications. While simple polarity sentiment analysis is a well-studied subject, fewer advances have been made in identifying more complex, finer-grained emotions using only textual data. In this paper, we present a Transformer-based model with a Fusion of Adapter layers which leverages knowledge from more simple sentiment analysis tasks to improve the emotion detection task on large scale dataset, such as CMU-MOSEI, using the textual modality only. Results show that our proposed method is competitive with other approaches. We obtained state-of-the-art results for emotion recognition on CMU-MOSEI even while using only the textual modality.

【9】 Consistent Sufficient Explanations and Minimal Local Rules for explaining regression and classification models 标题:用于解释回归和分类模型的一致充分解释和最小局部规则 链接:https://arxiv.org/abs/2111.04658

作者:Salim I. Amoukou,Nicolas J. B Brunel 备注:8 pages, 2 figures, 1 table 摘要:为了解释任何模型的决策,我们扩展了概率充分解释(P-SE)的概念。对于每种情况,该方法都会选择足够以高概率产生相同预测的最小特征子集,同时删除其他特征。P-SE的关键是计算保持相同预测的条件概率。因此,我们通过随机森林为任何数据$(\boldsymbol{X},Y)$引入了一个精确而快速的概率估计,并通过对其一致性的理论分析证明了它的有效性。因此,我们将P-SE推广到回归问题。此外,我们处理非二进制特征,没有学习$X$的分布,也没有进行预测的模型。最后,我们介绍了基于P-SE的回归/分类的局部规则解释,并将我们的方法与其他可解释AI方法进行了比较。这些方法在\url{www.github.com/salimamoukou/acv00}上以Python包的形式公开提供。 摘要:To explain the decision of any model, we extend the notion of probabilistic Sufficient Explanations (P-SE). For each instance, this approach selects the minimal subset of features that is sufficient to yield the same prediction with high probability, while removing other features. The crux of P-SE is to compute the conditional probability of maintaining the same prediction. Therefore, we introduce an accurate and fast estimator of this probability via random Forests for any data $(\boldsymbol{X}, Y)$ and show its efficiency through a theoretical analysis of its consistency. As a consequence, we extend the P-SE to regression problems. In addition, we deal with non-binary features, without learning the distribution of $X$ nor having the model for making predictions. Finally, we introduce local rule-based explanations for regression/classification based on the P-SE and compare our approaches w.r.t other explainable AI methods. These methods are publicly available as a Python package at \url{www.github.com/salimamoukou/acv00}.

【10】 Kernel Methods for Multistage Causal Inference: Mediation Analysis and Dynamic Treatment Effects 标题:多阶段因果推理的核心方法:中介分析和动态处理效果 链接:https://arxiv.org/abs/2111.03950

作者:Rahul Singh,Liyuan Xu,Arthur Gretton 机构:|, Department of Economics, Massachusetts, Institute of Technology, Cambridge, MA, Gatsby Computational Neuroscience Unit, University College London, London, UK, Correspondence, Massachusetts Institute of Technology, Cambridge, MA, USA, Funding information 备注:66 pages. Material in this draft previously appeared in a working paper presented at the 2020 NeurIPS Workshop on ML for Economic Policy (arXiv:2010.04855v1). We have divided the original working paper (arXiv:2010.04855v1) into two projects: one paper focusing on static settings (arXiv:2010.04855) and this paper focusing on dynamic settings 摘要:我们提出了用于中介分析和短期动态治疗效应的核岭回归估计。我们允许治疗、协变量和中介是离散的或连续的,以及低维、高维或无限维的。我们提出了基于核矩阵运算的闭式解的反事实结果的均值、增量和分布的估计。对于连续处理的情况,我们证明了有限样本率下的一致一致性。对于离散处理情形,我们证明了根n一致性、高斯近似和半参数效率。我们进行模拟,然后评估美国就业团队计划对弱势青年的中介和动态治疗效果。 摘要:We propose kernel ridge regression estimators for mediation analysis and dynamic treatment effects over short horizons. We allow treatments, covariates, and mediators to be discrete or continuous, and low, high, or infinite dimensional. We propose estimators of means, increments, and distributions of counterfactual outcomes with closed form solutions in terms of kernel matrix operations. For the continuous treatment case, we prove uniform consistency with finite sample rates. For the discrete treatment case, we prove root-n consistency, Gaussian approximation, and semiparametric efficiency. We conduct simulations then estimate mediated and dynamic treatment effects of the US Job Corps program for disadvantaged youth.

检测相关(6篇)

【1】 OMD: Orthogonal Malware Detection Using Audio, Image, and Static Features 标题:odd:使用音频、图像和静电功能的正交恶意软件检测 链接:https://arxiv.org/abs/2111.04710

作者:Lakshmanan Nataraj,Tajuddin Manhar Mohammed,Tejaswi Nanjundaswamy,Satish Chikkagoudar,Shivkumar Chandrasekaran,B. S. Manjunath 机构:Santa Barbara, California, U.S. Naval Research Laboratory, Washington, D.C., Inc. & UC Santa Barbara, B.S. Manjunath 备注:Submitted version - MILCOM 2021 IEEE Military Communications Conference 摘要:随着恶意软件和网络攻击数量的不断增加,需要“正交”网络防御方法,通过检测其他方法无法预测的独特恶意软件样本来补充现有方法。在本文中,我们提出了一种新的正交恶意软件检测(OMD)方法,使用音频描述符、图像相似性描述符和其他静态/统计特征的组合来识别恶意软件。首先,我们展示了当恶意软件二进制文件被表示为音频信号时,音频描述符如何有效地对恶意软件家族进行分类。然后,我们证明了对音频描述符的预测与对图像相似性描述符和其他静态特征的预测是正交的。此外,我们还开发了一个错误分析框架和一个度量来量化新特征集(或类型)相对于其他特征集的正交性。这使我们能够在总体框架中添加新的特性和检测方法。在恶意软件数据集上的实验结果表明,我们的方法为正交恶意软件检测提供了一个健壮的框架。 摘要:With the growing number of malware and cyber attacks, there is a need for "orthogonal" cyber defense approaches, which are complementary to existing methods by detecting unique malware samples that are not predicted by other methods. In this paper, we propose a novel and orthogonal malware detection (OMD) approach to identify malware using a combination of audio descriptors, image similarity descriptors and other static/statistical features. First, we show how audio descriptors are effective in classifying malware families when the malware binaries are represented as audio signals. Then, we show that the predictions made on the audio descriptors are orthogonal to the predictions made on image similarity descriptors and other static features. Further, we develop a framework for error analysis and a metric to quantify how orthogonal a new feature set (or type) is with respect to other feature sets. This allows us to add new features and detection methods to our overall framework. Experimental results on malware datasets show that our approach provides a robust framework for orthogonal malware detection.

【2】 CoughTrigger: Earbuds IMU Based Cough Detection Activator Using An Energy-efficient Sensitivity-prioritized Time Series Classifier 标题:CoughTrigger:基于耳机IMU的咳嗽检测激活器,使用节能、灵敏度优先的时间序列分类器 链接:https://arxiv.org/abs/2111.04185

作者:Shibo Zhang,Ebrahim Nemati,Minh Dinh,Nathan Folkman,Tousif Ahmed,Mahbubur Rahman,Jilong Kuang,Nabil Alshurafa,Alex Gao 机构:⋆ Northwestern University, Chicago, IL, USA, † Samsung Research America, Mountain View, CA, USA, ‡ Samsung Design Innovation Center, San Francisco, CA, USA 摘要:持续咳嗽是呼吸系统相关疾病的主要症状。越来越多的研究关注使用可穿戴设备检测咳嗽,特别是在新冠病毒-19大流行期间。在使用的各种传感器中,麦克风最广泛地用于检测咳嗽。然而,处理音频信号所需的高功耗阻碍了对电池有限的商用可穿戴产品(如耳塞)进行持续的基于音频的咳嗽检测。我们提出了一种咳嗽触发器,它利用耳塞中的低功率传感器,即惯性测量单元(IMU),作为咳嗽检测激活器来触发用于音频处理和分类的高功率传感器。它能够以最小的电池消耗作为备用服务一直运行,并在IMU检测到候选咳嗽时触发基于音频的咳嗽检测。此外,IMU的使用有利于提高咳嗽检测的特异性。实验在45名受试者身上进行,我们基于IMU的模型在遗漏一名受试者的评估下获得0.77 AUC分数。我们还通过自由生活数据和设备实现验证了其有效性。 摘要:Persistent coughs are a major symptom of respiratory-related diseases. Increasing research attention has been paid to detecting coughs using wearables, especially during the COVID-19 pandemic. Among all types of sensors utilized, microphone is most widely used to detect coughs. However, the intense power consumption needed to process audio signals hinders continuous audio-based cough detection on battery-limited commercial wearable products, such as earbuds. We present CoughTrigger, which utilizes a lower-power sensor, an inertial measurement unit (IMU), in earbuds as a cough detection activator to trigger a higher-power sensor for audio processing and classification. It is able to run all-the-time as a standby service with minimal battery consumption and trigger the audio-based cough detection when a candidate cough is detected from IMU. Besides, the use of IMU brings the benefit of improved specificity of cough detection. Experiments are conducted on 45 subjects and our IMU-based model achieved 0.77 AUC score under leave one subject out evaluation. We also validated its effectiveness on free-living data and through on-device implementation.

【3】 Positivity Validation Detection and Explainability via Zero Fraction Multi-Hypothesis Testing and Asymmetrically Pruned Decision Trees 标题:基于零分式多假设检验和非对称修剪决策树的正性验证检测与可解释性 链接:https://arxiv.org/abs/2111.04033

作者:Guy Wolf,Gil Shabat,Hanan Shteingart 机构:Vianai Systems 备注:Talk accepted to Causal Data Science Meeting, 2021 摘要:正性是从观测数据进行因果推断的三个条件之一。验证积极性的标准方法是分析倾向性的分布。然而,为了使非专家进行因果推理的能力民主化,需要设计一种算法,以(i)测试积极性,并(ii)解释在协变量空间中缺乏积极性的地方。后者可用于建议进一步因果分析的局限性和/或鼓励违反积极性的实验。本文的贡献在于首先提出了自动实证分析问题,然后提出了一种基于两步过程的算法。第一步,对协变量的倾向性条件进行建模,然后使用多假设检验分析后一种分布,以创建积极性违规标签。第二步使用不对称修剪的决策树进行解释。后者进一步转化为非专家能够理解的可读文本。我们在一个大型软件企业的专有数据集上演示了我们的方法。 摘要:Positivity is one of the three conditions for causal inference from observational data. The standard way to validate positivity is to analyze the distribution of propensity. However, to democratize the ability to do causal inference by non-experts, it is required to design an algorithm to (i) test positivity and (ii) explain where in the covariate space positivity is lacking. The latter could be used to either suggest the limitation of further causal analysis and/or encourage experimentation where positivity is violated. The contribution of this paper is first present the problem of automatic positivity analysis and secondly to propose an algorithm based on a two steps process. The first step, models the propensity condition on the covariates and then analyze the latter distribution using multiple hypothesis testing to create positivity violation labels. The second step uses asymmetrically pruned decision trees for explainability. The latter is further converted into readable text a non-expert can understand. We demonstrate our method on a proprietary data-set of a large software enterprise.

【4】 Towards noise robust trigger-word detection with contrastive learning pre-task for fast on-boarding of new trigger-words 标题:具有对比学习预任务的噪声鲁棒触发字检测,用于新触发字的快速自注册 链接:https://arxiv.org/abs/2111.03971

作者:Sivakumar Balasubramanian,Aditya Jajodia,Gowtham Srinivasan 机构:Samsung Research America, Mountain View, USA 备注:submitted to ICASSP 2022 摘要:触发字检测作为用户与语音助手进行通信的切入点,发挥着重要作用。但是支持一个特定的单词作为触发词需要大量的数据收集、扩充和标记。这使得支持新的触发词成为一个乏味而耗时的过程。为了解决这个问题,我们将对比学习作为一项预训练任务,帮助检测模型推广到不同的单词和噪声条件。我们探讨了监督对比技术,并提出了一种使用长句音频中的语块词的自我监督技术。我们发现,对比预训练技术在数据可用性较低的情况下,对新触发词的预训练效果与传统分类预训练相当。 摘要:Trigger-word detection plays an important role as the entry point of user's communication with voice assistants. But supporting a particular word as a trigger-word involves huge amount of data collection, augmentation and labelling for that word. This makes supporting new trigger-words a tedious and time consuming process. To combat this, we explore the use of contrastive learning as a pre-training task that helps the detection model to generalize to different words and noise conditions. We explore supervised contrastive techniques and also propose a self-supervised technique using chunked words from long sentence audios. We show that the contrastive pre-training techniques have comparable results to a traditional classification pre-training on new trigger words with less data availability.

【5】 Feature-Level Fusion of Super-App and Telecommunication Alternative Data Sources for Credit Card Fraud Detection 标题:用于信用卡诈骗检测的超级应用与电信备选数据源的特征级融合 链接:https://arxiv.org/abs/2111.03707

作者:Jaime D. Acevedo-Viloria,Sebastián Soriano Pérez,Jesus Solano,David Zarruk-Valencia,Fernando G. Paulin,Alejandro Correa-Bahnsen 机构:Rappi Colombia, Sebasti´an Soriano P´erez, Rappi M´exico 备注:Accepted for IEEE ISI 2021 摘要:当没有足够的数据来证实客户的身份时,身份盗窃是信贷机构面临的一个主要问题。在包含许多不同服务的大型数字平台的超级应用程序中,这个问题更为重要;在一个分支中失去客户通常意味着在其他服务中失去客户。在本文中,我们回顾了超级应用程序客户信息、手机线路数据和传统信用风险变量的特征级融合在身份盗窃信用卡欺诈早期检测中的有效性。通过提出的框架,我们在使用输入为替代数据和传统信贷局数据融合的模型时取得了更好的性能,ROC AUC得分为0.81。我们从一家信贷机构的数字平台数据库中对大约90000名用户评估了我们的方法。评估不仅使用传统的ML指标,还使用财务成本。 摘要:Identity theft is a major problem for credit lenders when there's not enough data to corroborate a customer's identity. Among super-apps large digital platforms that encompass many different services this problem is even more relevant; losing a client in one branch can often mean losing them in other services. In this paper, we review the effectiveness of a feature-level fusion of super-app customer information, mobile phone line data, and traditional credit risk variables for the early detection of identity theft credit card fraud. Through the proposed framework, we achieved better performance when using a model whose input is a fusion of alternative data and traditional credit bureau data, achieving a ROC AUC score of 0.81. We evaluate our approach over approximately 90,000 users from a credit lender's digital platform database. The evaluation was performed using not only traditional ML metrics but the financial costs as well.

【6】 Disaster mapping from satellites: damage detection with crowdsourced point labels 标题:来自卫星的灾害测绘:使用众包点标签进行损害检测 链接:https://arxiv.org/abs/2111.03693

作者:Danil Kuzin,Olga Isupova,Brooke D. Simmons,Steven Reece 机构:Department of Computer Science, Department of Physics, Lancaster University, Lancaster, UK, University of Bath, Bath, UK, Department of Engineering Science, University of Oxford, Oxford, UK 备注:3rd Workshop on Artificial Intelligence for Humanitarian Assistance and Disaster Response at NeurIPS 2021 摘要:灾难事件发生后立即提供的高分辨率卫星图像对于响应规划至关重要,因为它有助于广泛了解关键基础设施状态,如建筑物损坏、洪水和通路障碍。在这种规模上绘制损伤图需要数百个专家小时。然而,众包和深度学习的最新进展相结合,将所需的实时工作减少到仅几小时。要求志愿者放置点标记,而不是实际受损区域的形状,大大减少了灾难期间响应所需的分析时间。然而,不同的志愿者的评分可能不一致。这项工作提出了聚合潜在不一致损伤标记的方法,以训练神经网络损伤检测器。 摘要:High-resolution satellite imagery available immediately after disaster events is crucial for response planning as it facilitates broad situational awareness of critical infrastructure status such as building damage, flooding, and obstructions to access routes. Damage mapping at this scale would require hundreds of expert person-hours. However, a combination of crowdsourcing and recent advances in deep learning reduces the effort needed to just a few hours in real time. Asking volunteers to place point marks, as opposed to shapes of actual damaged areas, significantly decreases the required analysis time for response during the disaster. However, different volunteers may be inconsistent in their marking. This work presents methods for aggregating potentially inconsistent damage marks to train a neural network damage detector.

分类|识别(7篇)

【1】 A Survey of Human Activity Recognition in Smart Homes Based on IoT Sensors Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning 标题:基于物联网传感器算法的智能家居人类活动识别综述:分类、挑战和深度学习带来的机遇 链接:https://arxiv.org/abs/2111.04418

作者:Damien Bouchabou,Sao Mai Nguyen,Christophe Lohr,Benoit Leduc,Ioannis Kanellos 机构:Citation: Bouchabou, D.; Nguyen, S.;, Lohr, C.; LeDuc, B.; Kanellos, I. A, Survey of Human Activity Recognition, in Smart Homes Based on IoT Sensors, Algorithms: Taxonomies, Challenges, and Opportunities with Deep Learning., Sensors ,. 备注:None 摘要:物联网(IoT)技术的最新进展和传感器成本的降低鼓励了智能环境的发展,如智能家居。智能家居可以提供家居协助服务,以改善居民的生活质量、自主性和健康,特别是老年人和依赖者。要提供此类服务,智能家居必须能够了解住户的日常活动。识别智能家庭中人类活动的技术每天都在进步。但新的挑战每天都在出现。在本文中,我们介绍了通过环境传感器在智能家庭中识别人类活动的最新算法、工作、挑战和分类。此外,由于智能家庭中的活动识别是一个年轻的领域,我们提出了具体的问题,缺少和需要的贡献。同时也提出方向、研究机会和解决方案,以加速这一领域的进展。 摘要:Recent advances in Internet of Things (IoT) technologies and the reduction in the cost of sensors have encouraged the development of smart environments, such as smart homes. Smart homes can offer home assistance services to improve the quality of life, autonomy and health of their residents, especially for the elderly and dependent. To provide such services, a smart home must be able to understand the daily activities of its residents. Techniques for recognizing human activity in smart homes are advancing daily. But new challenges are emerging every day. In this paper, we present recent algorithms, works, challenges and taxonomy of the field of human activity recognition in a smart home through ambient sensors. Moreover, since activity recognition in smart homes is a young field, we raise specific problems, missing and needed contributions. But also propose directions, research opportunities and solutions to accelerate advances in this field.

【2】 A Comparison of Deep Learning Architectures for Optical Galaxy Morphology Classification 标题:光学星系形态分类的深度学习体系结构比较 链接:https://arxiv.org/abs/2111.04353

作者:Ezra Fielding,Clement N. Nyirenda,Mattia Vaccari 机构:Department of Computer Science, University of the Western Cape, Cape Town, South Africa, -,-,-, Department of Physics & Astronomy 摘要:星系形态的分类对于理解星系的形成和演化起着至关重要的作用。传统上,此过程是手动完成的。深度学习技术的出现为这一过程的自动化提供了空间。因此,本文比较了深度学习体系结构,以确定哪种体系结构最适合光学星系形态分类。采用Walmsley等人于2021年提出的模型训练方法,Zoobot Python库用于训练模型,以预测志愿者使用EfficientNet B0、DenseNet121和ResNet50作为核心模型架构制作的Galaxy Zoo贴花决策树响应。然后使用预测结果生成每个决策树问题的准确性度量,以确定体系结构性能。DenseNet121被发现在合理的训练时间内,在准确性方面产生最好的结果。将来,使用更深入的学习体系结构进行进一步的测试可能会证明是有益的。 摘要:The classification of galaxy morphology plays a crucial role in understanding galaxy formation and evolution. Traditionally, this process is done manually. The emergence of deep learning techniques has given room for the automation of this process. As such, this paper offers a comparison of deep learning architectures to determine which is best suited for optical galaxy morphology classification. Adapting the model training method proposed by Walmsley et al in 2021, the Zoobot Python library is used to train models to predict Galaxy Zoo DECaLS decision tree responses, made by volunteers, using EfficientNet B0, DenseNet121 and ResNet50 as core model architectures. The predicted results are then used to generate accuracy metrics per decision tree question to determine architecture performance. DenseNet121 was found to produce the best results, in terms of accuracy, with a reasonable training time. In future, further testing with more deep learning architectures could prove beneficial.

【3】 CubeLearn: End-to-end Learning for Human Motion Recognition from Raw mmWave Radar Signals 标题:CubeLearn:用于从原始毫米波雷达信号中识别人体运动的端到端学习 链接:https://arxiv.org/abs/2111.03976

作者:Peijun Zhao,Chris Xiaoxuan Lu,Bing Wang,Niki Trigoni,Andrew Markham 机构:University of Oxford, Oxford, United Kingdom, University of Edinburgh, Edinburgh, United Kingdom 摘要:近年来,毫米波FMCW雷达因其以人为中心的应用,如人体手势/活动识别,引起了人们极大的研究兴趣。现有的大多数管道都是基于传统的离散傅立叶变换(DFT)预处理和深度神经网络分类器混合方法构建的,以前的大部分工作都集中在设计下游分类器以提高整体精度。在这项工作中,我们后退一步,看看预处理模块。为了避免传统DFT预处理的缺点,我们提出了一个可学习的预处理模块CubeLearn,用于直接从原始雷达信号中提取特征,并为毫米波FMCW雷达运动识别应用构建端到端的深度神经网络。大量的实验表明,我们的CubeLearn模块持续提高了不同管道的分类精度,尤其有利于那些以前较弱的模型。我们对所提出的模块的初始化方法和结构进行了研究,并对PC和edge设备上的运行时间进行了评估。这项工作还比较了数据立方体切片的不同方法。通过我们的任务无关设计,我们提出了雷达识别问题通用端到端解决方案的第一步。 摘要:mmWave FMCW radar has attracted huge amount of research interest for human-centered applications in recent years, such as human gesture/activity recognition. Most existing pipelines are built upon conventional Discrete Fourier Transform (DFT) pre-processing and deep neural network classifier hybrid methods, with a majority of previous works focusing on designing the downstream classifier to improve overall accuracy. In this work, we take a step back and look at the pre-processing module. To avoid the drawbacks of conventional DFT pre-processing, we propose a learnable pre-processing module, named CubeLearn, to directly extract features from raw radar signal and build an end-to-end deep neural network for mmWave FMCW radar motion recognition applications. Extensive experiments show that our CubeLearn module consistently improves the classification accuracies of different pipelines, especially benefiting those previously weaker models. We provide ablation studies on initialization methods and structure of the proposed module, as well as an evaluation of the running time on PC and edge devices. This work also serves as a comparison of different approaches towards data cube slicing. Through our task agnostic design, we propose a first step towards a generic end-to-end solution for radar recognition problems.

【4】 Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective 标题:基于先验视角的长尾视觉识别校准模型研究 链接:https://arxiv.org/abs/2111.03874

作者:Zhengzhuo Xu,Zenghao Chai,Chun Yuan 机构:Shenzhen International Graduate School, Tsinghua University, Peng Cheng Laboratory 备注:Accepted at NeurIPS 2021 摘要:现实世界的数据普遍面临严重的类别不平衡问题,并呈现出长尾分布,即大多数标签与有限的实例相关联。受此类数据集监督的na \“ive模型更倾向于使用主导标签,遇到严重的泛化挑战,且校准能力差。我们从先前的角度提出了两种新方法来缓解这一困境。首先,我们推导了一种面向平衡的数据增强方法,称为统一混合(UniMix)为了促进长尾情景下的混合,采用了有利于少数人的高级混合因子和取样器。其次,受贝叶斯理论的启发,我们计算了贝叶斯偏差(Bayias),一种由先验不一致性引起的固有偏差,并将其作为对标准交叉熵损失的修正进行补偿。我们进一步证明,所提出的两种方法在理论上和经验上都确保了分类校准。大量实验验证了我们的策略有助于更好地校准模型,并且它们的组合ION在CIFAR-LT、ImageNet LT和iNaturalist 2018上实现了最先进的性能。 摘要:Real-world data universally confronts a severe class-imbalance problem and exhibits a long-tailed distribution, i.e., most labels are associated with limited instances. The na\"ive models supervised by such datasets would prefer dominant labels, encounter a serious generalization challenge and become poorly calibrated. We propose two novel methods from the prior perspective to alleviate this dilemma. First, we deduce a balance-oriented data augmentation named Uniform Mixup (UniMix) to promote mixup in long-tailed scenarios, which adopts advanced mixing factor and sampler in favor of the minority. Second, motivated by the Bayesian theory, we figure out the Bayes Bias (Bayias), an inherent bias caused by the inconsistency of prior, and compensate it as a modification on standard cross-entropy loss. We further prove that both the proposed methods ensure the classification calibration theoretically and empirically. Extensive experiments verify that our strategies contribute to a better-calibrated model, and their combination achieves state-of-the-art performance on CIFAR-LT, ImageNet-LT, and iNaturalist 2018.

【5】 Neyman-Pearson Multi-class Classification via Cost-sensitive Learning 标题:基于代价敏感学习的Neyman-Pearson多类分类 链接:https://arxiv.org/abs/2111.04597

作者:Ye Tian,Yang Feng 机构:Department of Statistics, Columbia University, New York, NY , USA, Department of Biostatistics, School of Global Public Health, New York University 备注:44 pages, 6 figures 摘要:大多数现有的分类方法旨在最大限度地降低总体误分类率,然而,在应用中,不同类型的错误可能会产生不同的后果。为了考虑这一不对称问题,已经开发了两种流行的范式,即Neyman-Pearson(NP)范式和成本敏感(CS)范式。与CS范式相比,NP范式不需要成本规范。以往关于NP范式的研究大多集中在二元情况下。在这项工作中,我们通过将多类NP问题与CS问题联系起来来研究它,并提出了两种算法。我们将NP-oracle不等式和一致性从二元情形推广到多类情形,并证明了我们的两个算法在一定条件下都具有这些性质。仿真和实际数据研究证明了算法的有效性。据我们所知,这是第一个通过具有理论保证的代价敏感学习技术来解决多类NP问题的工作。所提出的算法在CRAN上的R包“npcs”中实现。 摘要:Most existing classification methods aim to minimize the overall misclassification error rate, however, in applications, different types of errors can have different consequences. To take into account this asymmetry issue, two popular paradigms have been developed, namely the Neyman-Pearson (NP) paradigm and cost-sensitive (CS) paradigm. Compared to CS paradigm, NP paradigm does not require a specification of costs. Most previous works on NP paradigm focused on the binary case. In this work, we study the multi-class NP problem by connecting it to the CS problem, and propose two algorithms. We extend the NP oracle inequalities and consistency from the binary case to the multi-class case, and show that our two algorithms enjoy these properties under certain conditions. The simulation and real data studies demonstrate the effectiveness of our algorithms. To our knowledge, this is the first work to solve the multi-class NP problem via cost-sensitive learning techniques with theoretical guarantees. The proposed algorithms are implemented in the R package "npcs" on CRAN.

【6】 Human Activity Recognition using Attribute-Based Neural Networks and Context Information 标题:基于属性神经网络和上下文信息的人体活动识别 链接:https://arxiv.org/abs/2111.04564

作者:Stefan Lüdtke,Fernando Moya Rueda,Waqas Ahmed,Gernot A. Fink,Thomas Kirste 机构:Institute of Visual & Analytic Computing, University of Rostock, Germany, Department of Computer Science, TU Dortmund University, Germany 备注:3rd International Workshop on Deep Learning for Human Activity Recognition 摘要:我们从手工工作过程中的可穿戴传感器数据中考虑人类活动识别(HAR),如仓库订单拣选。这种结构化域通常可以划分为不同的过程步骤,例如打包或运输。每个过程步骤在活动类别上可能具有不同的先验分布,例如站立或行走,以及不同的系统动力学。在这里,我们展示了如何将这些上下文信息系统地集成到基于深度神经网络的HAR系统中。具体而言,我们提出了一种混合体系结构,该结构结合了从原始传感器数据估计高级运动描述符、属性的深层神经网络和从估计属性和(可选)上下文信息预测活动类的浅层分类器,如当前执行的过程步骤。我们的经验表明,与最先进的方法相比,我们提出的体系结构提高了HAR性能。此外,我们还表明,当包含有关流程步骤的信息时,HAR性能可以进一步提高,即使该信息仅部分正确。 摘要:We consider human activity recognition (HAR) from wearable sensor data in manual-work processes, like warehouse order-picking. Such structured domains can often be partitioned into distinct process steps, e.g., packaging or transporting. Each process step can have a different prior distribution over activity classes, e.g., standing or walking, and different system dynamics. Here, we show how such context information can be integrated systematically into a deep neural network-based HAR system. Specifically, we propose a hybrid architecture that combines a deep neural network-that estimates high-level movement descriptors, attributes, from the raw-sensor data-and a shallow classifier, which predicts activity classes from the estimated attributes and (optional) context information, like the currently executed process step. We empirically show that our proposed architecture increases HAR performance, compared to state-of-the-art methods. Additionally, we show that HAR performance can be further increased when information about process steps is incorporated, even when that information is only partially correct.

【7】 A new baseline for retinal vessel segmentation: Numerical identification and correction of methodological inconsistencies affecting 100+ papers 标题:视网膜血管分割的新基线:影响100多篇论文的方法不一致的数字识别和校正 链接:https://arxiv.org/abs/2111.03853

作者:György Kovács,Attila Fazekas 机构:Analytical Minds Ltd., ´Arp´ad street , Beregsur´any , Hungary, University of Debrecen, P.O.BOX , Debrecen , Hungary 摘要:在过去的15年中,视网膜图像中血管的分割已经成为医学成像领域的一个研究热点,已经发表了数百种算法。血管分割技术的事实基准数据集之一是驱动数据集。由于DRIVE包含训练图像和测试图像的预定义分割,因此各种分割技术的已发布性能结果应提供算法的可靠排名。包括100多篇研究论文,我们对公布的绩效分数的一致性进行了详细的数值分析。我们发现与视野(FoV)使用相关的报告分数不一致,这对绩效分数有显著影响。我们试图用数字技术消除这些偏见,以提供一幅更真实的艺术状态图。基于这些结果,我们得出了一些发现,最显著的是:尽管有定义明确的驾驶测试集,但发表论文中的大多数排名都基于不可比较的数字;与文献中报道的近乎完美的准确度得分相比,迄今为止,FoV区域的最高准确度得分为0.9582,比人类注释者的准确度高1%。我们开发的识别和消除评估偏差的方法可以很容易地应用于可能出现类似问题的其他领域。 摘要:In the last 15 years, the segmentation of vessels in retinal images has become an intensively researched problem in medical imaging, with hundreds of algorithms published. One of the de facto benchmarking data sets of vessel segmentation techniques is the DRIVE data set. Since DRIVE contains a predefined split of training and test images, the published performance results of the various segmentation techniques should provide a reliable ranking of the algorithms. Including more than 100 papers in the study, we performed a detailed numerical analysis of the coherence of the published performance scores. We found inconsistencies in the reported scores related to the use of the field of view (FoV), which has a significant impact on the performance scores. We attempted to eliminate the biases using numerical techniques to provide a more realistic picture of the state of the art. Based on the results, we have formulated several findings, most notably: despite the well-defined test set of DRIVE, most rankings in published papers are based on non-comparable figures; in contrast to the near-perfect accuracy scores reported in the literature, the highest accuracy score achieved to date is 0.9582 in the FoV region, which is 1% higher than that of human annotators. The methods we have developed for identifying and eliminating the evaluation biases can be easily applied to other domains where similar problems may arise.

表征(3篇)

【1】 Learning Context-Aware Representations of Subtrees 标题:学习子树的上下文感知表示 链接:https://arxiv.org/abs/2111.04308

作者:Cedric Cook 机构:Under the supervision of:, Dr. Stefan Magureanu, Klarna Bank AB, Under the direction of:, Prof. Boi Faltings, Artificial Intelligence Laboratory, EPFL, Stockholm, arXiv:,.,v, [cs.LG] , Nov 备注:36 pages, 12 Figures. This work was carried out as a Master Thesis at Klarna Bank AB, under supervision of Stefan Magureanu 摘要:这篇论文解决了学习复杂结构化数据的有效表示的问题,并将其自然应用于网页和元素分类。我们假设网页中元素周围的上下文对问题具有很高的价值,目前还没有得到充分利用。本文旨在通过考虑web元素的上下文,解决将web元素分类为DOM树的子树的问题。为了实现这一点,我们首先讨论当前在结构上工作的专家知识系统,例如树LSTM。然后,我们提出了该模型的上下文感知扩展。我们表明,新模型在多类web分类任务中的平均F1分数为0.7973。该模型为各种子树生成更好的表示,并可用于元素分类、网络强化学习中的状态估计等应用。 摘要:This thesis tackles the problem of learning efficient representations of complex, structured data with a natural application to web page and element classification. We hypothesise that the context around the element inside the web page is of high value to the problem and is currently under exploited. This thesis aims to solve the problem of classifying web elements as subtrees of a DOM tree by also considering their context. To achieve this, first we discuss current expert knowledge systems that work on structures, such as Tree-LSTM. Then, we propose context-aware extensions to this model. We show that the new model achieves an average F1-score of 0.7973 on a multi-class web classification task. This model generates better representations for various subtrees and may be used for applications such element classification, state estimators in reinforcement learning over the Web and more.

【2】 Deep Marching Tetrahedra: a Hybrid Representation for High-Resolution 3D Shape Synthesis 标题:深行军四面体:一种高分辨率三维形状合成的混合表示 链接:https://arxiv.org/abs/2111.04276

作者:Tianchang Shen,Jun Gao,Kangxue Yin,Ming-Yu Liu,Sanja Fidler 机构:NVIDIA, University of Toronto, Vector Institute 摘要:我们介绍了DMTet,一种深度3D条件生成模型,它可以使用简单的用户指南(如粗体素)合成高分辨率3D形状。它结合了隐式和显式3D表示的优点,利用了一种新的混合3D表示。与当前的隐式方法相比,DMTet直接对重建曲面进行优化,从而使我们能够以较少的伪影合成更精细的几何细节。与直接生成显式表示(如网格)的深层3D生成模型不同,我们的模型可以合成具有任意拓扑的形状。DMTet的核心包括一个可变形四面体网格,该网格编码离散化的有符号距离函数,以及一个可微行进四面体层,该层将隐式有符号距离表示转换为显式曲面网格表示。这种组合允许曲面几何和拓扑的联合优化,以及使用曲面网格上明确定义的重建和对抗损失生成细分层次。我们的方法明显优于现有的基于粗体素输入的条件形状合成工作,这些粗体素输入是在复杂的3D动物形状数据集上训练的。项目页面:https://nv-tlabs.github.io/DMTet/. 摘要:We introduce DMTet, a deep 3D conditional generative model that can synthesize high-resolution 3D shapes using simple user guides such as coarse voxels. It marries the merits of implicit and explicit 3D representations by leveraging a novel hybrid 3D representation. Compared to the current implicit approaches, which are trained to regress the signed distance values, DMTet directly optimizes for the reconstructed surface, which enables us to synthesize finer geometric details with fewer artifacts. Unlike deep 3D generative models that directly generate explicit representations such as meshes, our model can synthesize shapes with arbitrary topology. The core of DMTet includes a deformable tetrahedral grid that encodes a discretized signed distance function and a differentiable marching tetrahedra layer that converts the implicit signed distance representation to the explicit surface mesh representation. This combination allows joint optimization of the surface geometry and topology as well as generation of the hierarchy of subdivisions using reconstruction and adversarial losses defined explicitly on the surface mesh. Our approach significantly outperforms existing work on conditional shape synthesis from coarse voxel inputs, trained on a dataset of complex 3D animal shapes. Project page: https://nv-tlabs.github.io/DMTet/.

【3】 Representation Learning via Quantum Neural Tangent Kernels 标题:基于量子神经正切核的表示学习 链接:https://arxiv.org/abs/2111.04225

作者:Junyu Liu,Francesco Tacchino,Jennifer R. Glick,Liang Jiang,Antonio Mezzacapo 机构:Pritzker School of Molecular Engineering, The University of Chicago, Chicago, IL , USA, Chicago Quantum Exchange, Chicago, IL , USA, Kadanoff Center for Theoretical Physics, The University of Chicago, Chicago, IL , USA 备注:40=11+29 pages, many figures 摘要:变分量子电路用于量子机器学习和变分量子模拟任务。设计好的变分电路或预测它们在给定的学习或优化任务中的表现仍不清楚。在这里,我们讨论这些问题,用神经正切核理论分析变分量子电路。我们定义了量子神经正切核,并推导了优化和学习任务中相关损失函数的动力学方程。我们解析地求解冻结极限或惰性训练区域中的动力学,其中变分角变化缓慢,线性扰动足够好。我们将分析扩展到动态环境,包括变分角度的二次修正。然后我们考虑混合量子经典体系结构,并定义了混合核的大宽度极限,表明混合量子经典神经网络可以近似高斯。本文给出的结果表明,对用于量子机器学习和优化问题的变分量子电路的训练动力学的分析理解是可能的。这些分析结果得到了量子机器学习实验数值模拟的支持。 摘要:Variational quantum circuits are used in quantum machine learning and variational quantum simulation tasks. Designing good variational circuits or predicting how well they perform for given learning or optimization tasks is still unclear. Here we discuss these problems, analyzing variational quantum circuits using the theory of neural tangent kernels. We define quantum neural tangent kernels, and derive dynamical equations for their associated loss function in optimization and learning tasks. We analytically solve the dynamics in the frozen limit, or lazy training regime, where variational angles change slowly and a linear perturbation is good enough. We extend the analysis to a dynamical setting, including quadratic corrections in the variational angles. We then consider hybrid quantum-classical architecture and define a large width limit for hybrid kernels, showing that a hybrid quantum-classical neural network can be approximately Gaussian. The results presented here show limits for which analytical understandings of the training dynamics for variational quantum circuits, used for quantum machine learning and optimization problems, are possible. These analytical results are supported by numerical simulations of quantum machine learning experiments.

编码器(1篇)

【1】 A Review of Location Encoding for GeoAI: Methods and Applications 标题:GeoAI中的位置编码:方法与应用综述 链接:https://arxiv.org/abs/2111.04006

作者:Gengchen Mai,Krzysztof Janowicz,Yingjie Hu,Song Gao,Bo Yan,Rui Zhu,Ling Cai,Ni Lao 机构:STKO Lab, Department of Geography, UC Santa Barbara, CA, USA,;, Center for Spatial Studies, UC Santa Barbara, CA, USA,;, Department of Computer Science, Stanford University, CA, USA,;, GeoAI Lab, Department of Geography, University at Buffalo, NY, USA,; 备注:32 Pages, 5 Figures, Accepted to International Journal of Geographical Information Science, 2021 摘要:在更广泛的地球科学中,人工智能模型的一个共同需求是表示和编码各种类型的空间数据,例如点(例如兴趣点)、多段线(例如轨迹)、多边形(例如行政区域)、图形(例如运输网络)或光栅(例如遥感图像),在一个隐藏的嵌入空间,以便他们可以很容易地纳入深入学习模型。一个基本步骤是将单点位置编码到嵌入空间中,这样嵌入对于下游机器学习模型(如支持向量机和神经网络)是学习友好的。我们称这个过程为位置编码。然而,对于位置编码的概念、其潜在应用和需要解决的关键挑战缺乏系统的回顾。本文旨在填补这一空白。首先给出了位置编码的形式化定义,并从机器学习的角度讨论了地理人工智能研究中位置编码的必要性。接下来,我们对当前位置编码研究的现状进行了全面的调查和讨论。我们根据输入和编码方法将位置编码模型分为不同的类别,并根据它们是否参数化、多尺度、距离保持和方向感知对它们进行比较。我们证明了现有的位置编码模型可以统一在一个共享的公式框架下。我们还讨论了位置编码在不同类型空间数据中的应用。最后,我们指出了位置编码研究中需要解决的几个挑战。 摘要:A common need for artificial intelligence models in the broader geoscience is to represent and encode various types of spatial data, such as points (e.g., points of interest), polylines (e.g., trajectories), polygons (e.g., administrative regions), graphs (e.g., transportation networks), or rasters (e.g., remote sensing images), in a hidden embedding space so that they can be readily incorporated into deep learning models. One fundamental step is to encode a single point location into an embedding space, such that this embedding is learning-friendly for downstream machine learning models such as support vector machines and neural networks. We call this process location encoding. However, there lacks a systematic review on the concept of location encoding, its potential applications, and key challenges that need to be addressed. This paper aims to fill this gap. We first provide a formal definition of location encoding, and discuss the necessity of location encoding for GeoAI research from a machine learning perspective. Next, we provide a comprehensive survey and discussion about the current landscape of location encoding research. We classify location encoding models into different categories based on their inputs and encoding methods, and compare them based on whether they are parametric, multi-scale, distance preserving, and direction aware. We demonstrate that existing location encoding models can be unified under a shared formulation framework. We also discuss the application of location encoding for different types of spatial data. Finally, we point out several challenges in location encoding research that need to be solved in the future.

优化|敛散性(9篇)

【1】 Nonnegative Tensor Completion via Integer Optimization 标题:基于整数优化的非负张量补全 链接:https://arxiv.org/abs/2111.04580

作者:Caleb Bugg,Chen Chen,Anil Aswani 机构:University of California, Berkeley, The Ohio State University 摘要:与矩阵补全不同,迄今为止,张量补全问题的任何算法都无法达到信息论样本复杂度。本文针对非负张量完成的特殊情况,提出了一种新的算法。我们证明了我们的算法在达到信息论速率的同时,以线性(数值容差)的oracle步数收敛。我们的方法是使用我们构造的特定0-1多面体的规范来定义非负张量的新范数。因为范数是使用0-1多面体定义的,这意味着我们可以使用整数线性规划来解决多面体上的线性分离问题。我们将这一观点与Frank-Wolfe算法的一个变体结合起来,构造了我们的数值算法,并通过实验证明了其有效性和可扩展性。 摘要:Unlike matrix completion, no algorithm for the tensor completion problem has so far been shown to achieve the information-theoretic sample complexity rate. This paper develops a new algorithm for the special case of completion for nonnegative tensors. We prove that our algorithm converges in a linear (in numerical tolerance) number of oracle steps, while achieving the information-theoretic rate. Our approach is to define a new norm for nonnegative tensors using the gauge of a specific 0-1 polytope that we construct. Because the norm is defined using a 0-1 polytope, this means we can use integer linear programming to solve linear separation problems over the polytope. We combine this insight with a variant of the Frank-Wolfe algorithm to construct our numerical algorithm, and we demonstrate its effectiveness and scalability through experiments.

【2】 BlueFog: Make Decentralized Algorithms Practical for Optimization and Deep Learning 标题:BlueFog:使分散的算法适用于优化和深度学习 链接:https://arxiv.org/abs/2111.04287

作者:Bicheng Ying,Kun Yuan,Hanbin Hu,Yiming Chen,Wotao Yin 机构:University of California, Los Angeles ,Google Inc., DAMO Academy, Alibaba Group ,University of California, Santa Barbara 摘要:分散算法是一种计算形式,它通过依赖于直接连接的代理之间的低成本通信的局部动态来实现全局目标。在涉及分布式数据集的大规模优化任务中,与具有中心节点的分布式算法相比,分散算法表现出强大的性能,有时甚至更优越。近年来,开发用于深度学习的分散算法引起了广泛关注。它们被认为是使用参数服务器或环Allreduce协议的低通信开销替代方案。然而,由于缺乏易于使用和高效的软件包,大多数分散算法只能停留在纸面上。为了填补这一空白,我们引入了BlueFog,这是一个python库,用于各种分散算法的简单、高性能实现。基于对各种通信操作的统一抽象,BlueFog提供了直观的界面来实现一系列分散算法,从使用静态无向图进行同步操作的算法到使用动态有向图进行异步操作的算法。BlueFog还采用了几种系统级加速技术,以进一步优化深度学习任务的性能。在主流DNN训练任务中,BlueFog的吞吐量要高得多,与基于Ring Allreduce的最先进分布式深度学习软件包Horovod相比,其总加速比为$1.2\times\sim 1.8\times$。BlueFog是开源的https://github.com/Bluefog-Lib/bluefog. 摘要:Decentralized algorithm is a form of computation that achieves a global goal through local dynamics that relies on low-cost communication between directly-connected agents. On large-scale optimization tasks involving distributed datasets, decentralized algorithms have shown strong, sometimes superior, performance over distributed algorithms with a central node. Recently, developing decentralized algorithms for deep learning has attracted great attention. They are considered as low-communication-overhead alternatives to those using a parameter server or the Ring-Allreduce protocol. However, the lack of an easy-to-use and efficient software package has kept most decentralized algorithms merely on paper. To fill the gap, we introduce BlueFog, a python library for straightforward, high-performance implementations of diverse decentralized algorithms. Based on a unified abstraction of various communication operations, BlueFog offers intuitive interfaces to implement a spectrum of decentralized algorithms, from those using a static, undirected graph for synchronous operations to those using dynamic and directed graphs for asynchronous operations. BlueFog also adopts several system-level acceleration techniques to further optimize the performance on the deep learning tasks. On mainstream DNN training tasks, BlueFog reaches a much higher throughput and achieves an overall $1.2\times \sim 1.8\times$ speedup over Horovod, a state-of-the-art distributed deep learning package based on Ring-Allreduce. BlueFog is open source at https://github.com/Bluefog-Lib/bluefog.

【3】 Teamwork makes von Neumann work: Min-Max Optimization in Two-Team Zero-Sum Games 标题:团队合作使冯·诺依曼工作:两队零和博弈中的最小-最大优化 链接:https://arxiv.org/abs/2111.04178

作者:Fivos Kalogiannis,Emmanouil-Vasileios Vlatakis-Gkaragkounis,Ioannis Panageas 摘要:基于多人游戏理论和应用方面的最新进展,从电子竞技到多智能体生成对抗网络,我们关注团队零和游戏中的最小-最大优化。在这类游戏中,玩家被分成两个团队,同一团队中的收益相等,对手团队中的符号相反。与教科书中的两人零和博弈不同,在我们的课堂上找到纳什均衡是很困难的,也就是说,不太可能有计算纳什均衡的多项式时间算法。此外,在这个广义框架中,我们证明了即使是渐近的最后一次迭代或时间平均收敛到纳什均衡也不可能使用梯度下降上升(GDA)、其乐观变量和额外梯度。具体地说,我们提出了一个团队对策族,其诱导效用是\emph{non}多重线性的\emph{non}吸引\emph{see}混合纳什均衡,作为基础优化景观的严格鞍点。利用控制理论的技术,我们通过设计局部收敛于纳什均衡的改进GDA来补充这些负面结果。最后,我们讨论了我们的框架与AI体系结构的联系,AI体系结构具有团队竞争结构,如多代理生成对抗网络。 摘要:Motivated by recent advances in both theoretical and applied aspects of multiplayer games, spanning from e-sports to multi-agent generative adversarial networks, we focus on min-max optimization in team zero-sum games. In this class of games, players are split in two teams with payoffs equal within the same team and of opposite sign across the opponent team. Unlike the textbook two-player zero-sum games, finding a Nash equilibrium in our class can be shown to be CLS-hard, i.e., it is unlikely to have a polynomial time algorithm for computing Nash equilibria. Moreover in this generalized framework, we establish that even asymptotic last iterate or time average convergence to a Nash Equilibrium is not possible using Gradient Descent Ascent (GDA), its optimistic variant and extra gradient. Specifically, we present a family of team games whose induced utility is \emph{non} multi-linear with \emph{non} attractive \emph{per-se} mixed Nash Equilibria, as strict saddle points of the underlying optimization landscape. Leveraging techniques from control theory, we complement these negative results by designing a modified GDA that converges locally to Nash equilibria. Finally, we discuss connections of our framework with AI architectures with team competition structure like multi-agent generative adversarial networks.

【4】 Sampling from Log-Concave Distributions with Infinity-Distance Guarantees and Applications to Differentially Private Optimization 标题:无限距离保证的对数凹分布抽样及其在差分私有优化中的应用 链接:https://arxiv.org/abs/2111.04089

作者:Oren Mangoubi,Nisheeth K. Vishnoi 机构:Worcester Polytechnic Institute, Yale University 摘要:对于$d维的对数凹分布$\π(\theta)\ Poto E^ {-f(\theta)},在一个多面体$K$上,我们考虑从一个分布$\NU$输出样本的问题,它是$O(\VaRePiSLON)$-接近无穷远$$Suf{{theta〉,在k} \ log \ { nu(\theta)}{\pi(\theta)} $到$pi-$。这种具有无限距离保证的采样器特别适用于差分私有优化,因为带有总变化距离或KL发散边界的传统采样算法不足以保证差分隐私。我们的主要结果是一个算法,该算法从分布$O(\varepsilon)$-接近$\pi$的无限距离输出一个点,需要$O((md+dL^2R^2)次(LR+d\log(\frac{Rd+LRd}{\varepsilon r}))\次md^{\omega-1}$算术运算,其中$f$是$L$-Lipschitz,$K$由$m$不等式定义,包含在半径为$R$的球中,并包含半径较小的球$R$,并且$\omega$是矩阵乘法常数。特别是,此运行时在$\frac{1}{\varepsilon}$中是对数的,并且显著改进了以前的工作。从技术上讲,我们不同于以往在$K$的$\frac{1}{\varepsilon^2}$离散化上构造马尔可夫链以实现具有$O(\varepsilon)$无限距离误差的样本,并提出了一种将具有总变差界的$K$连续样本转换为具有无穷边界的样本的方法。为了改善对$d$的依赖性,我们提出了一个“软阈值”版本的Dikin walk,它可能具有独立的意义。将我们的算法插入到指数机制的框架中,可以在$\varepsilon$纯微分私有算法的运行时间上产生类似的改进,这些算法用于优化问题,如Lipschitz凸函数的经验风险最小化和低秩近似,同时仍然达到已知的最严格的效用界限。 摘要:For a $d$-dimensional log-concave distribution $\pi(\theta)\propto e^{-f(\theta)}$ on a polytope $K$, we consider the problem of outputting samples from a distribution $\nu$ which is $O(\varepsilon)$-close in infinity-distance $\sup_{\theta\in K}|\log\frac{\nu(\theta)}{\pi(\theta)}|$ to $\pi$. Such samplers with infinity-distance guarantees are specifically desired for differentially private optimization as traditional sampling algorithms which come with total-variation distance or KL divergence bounds are insufficient to guarantee differential privacy. Our main result is an algorithm that outputs a point from a distribution $O(\varepsilon)$-close to $\pi$ in infinity-distance and requires $O((md+dL^2R^2)\times(LR+d\log(\frac{Rd+LRd}{\varepsilon r}))\times md^{\omega-1})$ arithmetic operations, where $f$ is $L$-Lipschitz, $K$ is defined by $m$ inequalities, is contained in a ball of radius $R$ and contains a ball of smaller radius $r$, and $\omega$ is the matrix-multiplication constant. In particular this runtime is logarithmic in $\frac{1}{\varepsilon}$ and significantly improves on prior works. Technically, we depart from the prior works that construct Markov chains on a $\frac{1}{\varepsilon^2}$-discretization of $K$ to achieve a sample with $O(\varepsilon)$ infinity-distance error, and present a method to convert continuous samples from $K$ with total-variation bounds to samples with infinity bounds. To achieve improved dependence on $d$, we present a "soft-threshold" version of the Dikin walk which may be of independent interest. Plugging our algorithm into the framework of the exponential mechanism yields similar improvements in the running time of $\varepsilon$-pure differentially private algorithms for optimization problems such as empirical risk minimization of Lipschitz-convex functions and low-rank approximation, while still achieving the tightest known utility bounds.

【5】 Optimal and Efficient Dynamic Regret Algorithms for Non-Stationary Dueling Bandits 标题:非平稳对决过程中最优高效的动态后悔算法 链接:https://arxiv.org/abs/2111.03917

作者:Shubham Gupta,Aadirupa Saha 机构:Indian Institute of Science 摘要:研究了非平稳或时变偏好下$K$武装决斗土匪的动态后悔最小化问题。这是一个在线学习设置,在该设置中,代理在每轮中选择一对项目,并仅观察这对项目的相对二进制“赢-输”反馈,从该轮的基本偏好矩阵中取样。我们首先研究了对抗性偏好序列的静态遗憾最小化问题,设计了一个高概率遗憾为$O(\sqrt{KT})$的高效算法。接下来,我们使用类似的算法思想,在两种非平稳性的概念下,提出了一种有效且可证明最优的动态遗憾最小化算法。特别是,我们建立了$\tO(\sqrt{SKT})$和$\tO({V_T^{1/3}K^{1/3}T^{2/3}})$动态后悔保证,$S$是基本偏好关系中的“有效开关”总数,$V_T$是“连续变化”非平稳性的度量。尽管非平稳环境在现实世界系统中具有实用性,但在这项工作之前,尚未研究这些问题的复杂性。我们通过证明在上述两种非平稳性概念下匹配的下界保证来证明我们算法的最优性。最后,我们通过大量的模拟验证了我们的结果,并将我们的算法的有效性与最先进的基线进行了比较。 摘要:We study the problem of \emph{dynamic regret minimization} in $K$-armed Dueling Bandits under non-stationary or time varying preferences. This is an online learning setup where the agent chooses a pair of items at each round and observes only a relative binary `win-loss' feedback for this pair, sampled from an underlying preference matrix at that round. We first study the problem of static-regret minimization for adversarial preference sequences and design an efficient algorithm with $O(\sqrt{KT})$ high probability regret. We next use similar algorithmic ideas to propose an efficient and provably optimal algorithm for dynamic-regret minimization under two notions of non-stationarities. In particular, we establish $\tO(\sqrt{SKT})$ and $\tO({V_T^{1/3}K^{1/3}T^{2/3}})$ dynamic-regret guarantees, $S$ being the total number of `effective-switches' in the underlying preference relations and $V_T$ being a measure of `continuous-variation' non-stationarity. The complexity of these problems have not been studied prior to this work despite the practicability of non-stationary environments in real world systems. We justify the optimality of our algorithms by proving matching lower bound guarantees under both the above-mentioned notions of non-stationarities. Finally, we corroborate our results with extensive simulations and compare the efficacy of our algorithms over state-of-the-art baselines.

【6】 Stock Portfolio Optimization Using a Deep Learning LSTM Model 标题:基于深度学习LSTM模型的股票投资组合优化 链接:https://arxiv.org/abs/2111.04709

作者:Jaydip Sen,Abhishek Dutta,Sidra Mehtab 机构:Department of Data Science, Praxis Business School, Kolkata, INDIA 备注:This is the accepted version of our paper in the international conference, IEEE Mysurucon'21, which was organized in Hassan, Karnataka, India from October 24, 2021 to October 25, 2021. The paper is 9 pages long, and it contains 19 figures and 19 tables. This is the preprint of the conference paper 摘要:预测未来股票价格及其运动模式是一个复杂的问题。因此,使用预测价格构建资本资产组合以实现其收益和风险之间的优化是一项更加困难的任务。本研究对2016年1月1日至2020年12月31日期间印度股市九个不同板块的前五大股票的历史价格时间序列进行了分析。为这些部门中的每一个建立最佳投资组合。为了预测未来的股票价格,我们还设计并微调了一个长期和短期记忆(LSTM)模型。在投资组合构建五个月后,计算每个投资组合的实际和预测收益和风险。每个投资组合的预测和实际回报率都很高,这表明LSTM模型的精度很高。 摘要:Predicting future stock prices and their movement patterns is a complex problem. Hence, building a portfolio of capital assets using the predicted prices to achieve the optimization between its return and risk is an even more difficult task. This work has carried out an analysis of the time series of the historical prices of the top five stocks from the nine different sectors of the Indian stock market from January 1, 2016, to December 31, 2020. Optimum portfolios are built for each of these sectors. For predicting future stock prices, a long-and-short-term memory (LSTM) model is also designed and fine-tuned. After five months of the portfolio construction, the actual and the predicted returns and risks of each portfolio are computed. The predicted and the actual returns of each portfolio are found to be high, indicating the high precision of the LSTM model.

【7】 Mixed-Integer Optimization with Constraint Learning 标题:带约束学习的混合整数优化 链接:https://arxiv.org/abs/2111.04469

作者:Donato Maragno,Holly Wiberg,Dimitris Bertsimas,S. Ilker Birbil,Dick den Hertog,Adejuyigbe Fajemisin 机构:Amsterdam Business School, University of Amsterdam, TV Amsterdam, Netherlands 摘要:我们建立了一个广泛的方法学基础的混合整数优化学习约束。我们提出了一种用于数据驱动决策的端到端管道,其中使用机器学习直接从数据中学习约束和目标,并将训练好的模型嵌入到优化公式中。我们利用了许多机器学习方法的混合整数优化表示性,包括线性模型、决策树、集成和多层感知器。考虑多种方法使我们能够捕捉决策、上下文变量和结果之间的各种潜在关系。我们还使用观察的凸包来描述决策信任区域,以确保可靠的建议并避免外推。我们使用列生成和聚类有效地结合了这种表示。结合领域驱动约束和目标项,嵌入模型和信赖域定义了一个用于处方生成的混合整数优化问题。我们将此框架实现为一个Python包(OptiCL),供实践者使用。我们在化疗优化和世界粮食计划署规划中都展示了该方法。案例研究说明了该框架在生成高质量处方、信任区域增加的价值、合并多种机器学习方法以及包含多种学习约束方面的优势。 摘要:We establish a broad methodological foundation for mixed-integer optimization with learned constraints. We propose an end-to-end pipeline for data-driven decision making in which constraints and objectives are directly learned from data using machine learning, and the trained models are embedded in an optimization formulation. We exploit the mixed-integer optimization-representability of many machine learning methods, including linear models, decision trees, ensembles, and multi-layer perceptrons. The consideration of multiple methods allows us to capture various underlying relationships between decisions, contextual variables, and outcomes. We also characterize a decision trust region using the convex hull of the observations, to ensure credible recommendations and avoid extrapolation. We efficiently incorporate this representation using column generation and clustering. In combination with domain-driven constraints and objective terms, the embedded models and trust region define a mixed-integer optimization problem for prescription generation. We implement this framework as a Python package (OptiCL) for practitioners. We demonstrate the method in both chemotherapy optimization and World Food Programme planning. The case studies illustrate the benefit of the framework in generating high-quality prescriptions, the value added by the trust region, the incorporation of multiple machine learning methods, and the inclusion of multiple learned constraints.

【8】 AGGLIO: Global Optimization for Locally Convex Functions 标题:AGGLIO:局部凸函数的全局优化 链接:https://arxiv.org/abs/2111.03932

作者:Debojyoti Dey,Bhaskar Mukhoty,Purushottam Kar 机构:IIT Kanpur 备注:33 pages, 7 figures, to appear at 9th ACM IKDD Conference on Data Science (CODS) 2022. Code for AGGLIO is available at this https URL 摘要:本文介绍了AGGIO(加速分级广义线性模型优化),这是一种分阶段分级优化技术,为目标仅具有局部凸性且可能在全局范围内甚至不具有拟凸性的非凸优化问题提供全局收敛保证。特别是,这包括利用流行的激活函数(如sigmoid、softplus和SiLU)的学习问题,这些函数产生非凸训练目标。AGGLGIO可以很容易地使用点和小批量SGD更新实现,并在一般条件下提供可证明的全局最优收敛性。在实验中,aggio在收敛速度和收敛精度方面优于最近提出的几种非凸和局部凸目标优化技术。Aggio依赖于广义线性模型的分级技术,以及一种新颖的证明策略,这两种方法可能都是独立的。 摘要:This paper presents AGGLIO (Accelerated Graduated Generalized LInear-model Optimization), a stage-wise, graduated optimization technique that offers global convergence guarantees for non-convex optimization problems whose objectives offer only local convexity and may fail to be even quasi-convex at a global scale. In particular, this includes learning problems that utilize popular activation functions such as sigmoid, softplus and SiLU that yield non-convex training objectives. AGGLIO can be readily implemented using point as well as mini-batch SGD updates and offers provable convergence to the global optimum in general conditions. In experiments, AGGLIO outperformed several recently proposed optimization techniques for non-convex and locally convex objectives in terms of convergence rate as well as convergent accuracy. AGGLIO relies on a graduation technique for generalized linear models, as well as a novel proof strategy, both of which may be of independent interest.

【9】 Distributed stochastic proximal algorithm with random reshuffling for non-smooth finite-sum optimization 标题:非光滑有限和优化的随机洗牌分布式随机逼近算法 链接:https://arxiv.org/abs/2111.03820

作者:Xia Jiang,Xianlin Zeng,Jian Sun,Jie Chen,Lihua Xie 机构:cn) is with Key Laboratory of IntelligentControl and Decision of Complex Systems, School of Automation, BeijingInstitute of Technology 备注:15 pages, 7 figures 摘要:非光滑有限和极小化是机器学习中的一个基本问题。针对时变多智能体网络中的有限和极小化问题,提出了一种基于随机重组的分布式随机近似梯度算法。目标函数是可微凸函数和非光滑正则化的和。网络中的每个代理通过局部信息以恒定的步长更新局部变量,并进行协作以寻求最优解。我们证明了由该算法生成的局部变量估计达到一致性,并以$\mathcal{O}(\frac{1}{T}+\frac{1}{T})的收敛速度被吸引到最优解的邻域。此外,本文还表明,通过选择足够小的步长,目标函数的稳态误差可以任意小。最后,通过对比仿真验证了该算法的收敛性能。 摘要:The non-smooth finite-sum minimization is a fundamental problem in machine learning. This paper develops a distributed stochastic proximal-gradient algorithm with random reshuffling to solve the finite-sum minimization over time-varying multi-agent networks. The objective function is a sum of differentiable convex functions and non-smooth regularization. Each agent in the network updates local variables with a constant step-size by local information and cooperates to seek an optimal solution. We prove that local variable estimates generated by the proposed algorithm achieve consensus and are attracted to a neighborhood of the optimal solution in expectation with an $\mathcal{O}(\frac{1}{T}+\frac{1}{\sqrt{T}})$ convergence rate. In addition, this paper shows that the steady-state error of the objective function can be arbitrarily small by choosing small enough step-sizes. Finally, some comparative simulations are provided to verify the convergence performance of the proposed algorithm.

预测|估计(11篇)

【1】 Data-driven Set-based Estimation of Polynomial Systems with Application to SIR Epidemics 标题:基于数据驱动集的多项式系统估计及其在SIR流行病中的应用 链接:https://arxiv.org/abs/2111.04704

作者:Amr Alanwar,Muhammad Umar B. Niazi,Karl H. Johansson 机构: KTH Royal Institute of Technology 摘要:针对一类具有多项式非线性的非线性系统,提出了一种基于数据驱动集的估计算法。该方法利用系统的输入输出数据,实时计算一组保证包含系统状态的数据。虽然系统被假定为多项式类型,但不需要知道精确的多项式函数及其系数。为此,估计器依赖于离线和在线阶段。离线阶段利用过去的输入输出数据来估计多项式系统的一组可能系数。然后,使用这组估计的系数和关于系统的旁侧信息,在线阶段提供状态的一组估计。最后,通过在SIR(易感、感染、恢复)流行病模型上的应用,对所提出的方法进行了评价。 摘要:This paper proposes a data-driven set-based estimation algorithm for a class of nonlinear systems with polynomial nonlinearities. Using the system's input-output data, the proposed method computes in real-time a set that guarantees the inclusion of the system's state. Although the system is assumed to be polynomial type, the exact polynomial functions and their coefficients need not be known. To this end, the estimator relies on offline and online phases. The offline phase utilizes past input-output data to estimate a set of possible coefficients of the polynomial system. Then, using this estimated set of coefficients and the side information about the system, the online phase provides a set estimate of the state. Finally, the proposed methodology is evaluated through its application on SIR (Susceptible, Infected, Recovered) epidemic model.

【2】 Sexism Prediction in Spanish and English Tweets Using Monolingual and Multilingual BERT and Ensemble Models 标题:使用单语和多语BERT和集成模型预测西班牙语和英语推文中的性别歧视 链接:https://arxiv.org/abs/2111.04551

作者:Angel Felipe Magnossão de Paula,Roberto Fray da Silva,Ipek Baris Schlicht 机构: Universitat Politcnica de Valncia, Spain, Escola Politcnica da Universidade de So Paulo 备注:18 pages, presented at IberLEF: this http URL, the best scoring system at EXIST 摘要:社交媒体的普及带来了仇恨言论和性别歧视等问题。社会媒体中性别歧视的识别和分类是非常相关的任务,因为这将有助于建立一个更健康的社会环境。然而,这些任务相当具有挑战性。这项工作提出了一个系统,使用多语言和单语的BERT和数据点翻译和集成策略来识别和分类英语和西班牙语中的性别歧视。它是在伊比利亚语言评估论坛(IberLEF)提出的社会网络共享2021(EXIST 2021)任务中进行性别歧视识别的。描述了所提出的系统及其主要部件,并进行了深入的超参数分析。观察到的主要结果是:(i)系统比基线模型(多语言BERT)获得更好的结果;(ii)集成模型比单语模型获得更好的结果;(iii)考虑所有单个模型和最佳标准化值的集成模型可获得两项任务的最佳精确度和F1分数。这项工作在EXIST的两项任务中均获得第一名,具有最高的精确度(任务1为0.780,任务2为0.658)和F1分数(任务1为0.780的F1二进制,任务2为0.579的F1宏)。 摘要:The popularity of social media has created problems such as hate speech and sexism. The identification and classification of sexism in social media are very relevant tasks, as they would allow building a healthier social environment. Nevertheless, these tasks are considerably challenging. This work proposes a system to use multilingual and monolingual BERT and data points translation and ensemble strategies for sexism identification and classification in English and Spanish. It was conducted in the context of the sEXism Identification in Social neTworks shared 2021 (EXIST 2021) task, proposed by the Iberian Languages Evaluation Forum (IberLEF). The proposed system and its main components are described, and an in-depth hyperparameters analysis is conducted. The main results observed were: (i) the system obtained better results than the baseline model (multilingual BERT); (ii) ensemble models obtained better results than monolingual models; and (iii) an ensemble model considering all individual models and the best standardized values obtained the best accuracies and F1-scores for both tasks. This work obtained first place in both tasks at EXIST, with the highest accuracies (0.780 for task 1 and 0.658 for task 2) and F1-scores (F1-binary of 0.780 for task 1 and F1-macro of 0.579 for task 2).

【3】 Weapon Engagement Zone Maximum Launch Range Estimation Using a Deep Neural Network 标题:基于深度神经网络的武器交战区域最大射程估计 链接:https://arxiv.org/abs/2111.04474

作者:Joao P. A. Dantas,Andre N. Costa,Diego Geraldo,Marcos R. O. A. Maximo,Takashi Yoneyama 摘要:这项工作研究了使用深度神经网络(DNN)来估计武器交战区(WEZ)的最大发射距离。WEZ允许飞行员识别可用导弹成功打击特定目标的可能性更大的空域,即敌方易受射击的飞机周围的假设区域。我们提出了一种在可变条件下使用50000次模拟发射确定给定导弹WEZ的方法。这些模拟用于训练DNN,当飞机发现自己处于不同的射击条件时,该DNN可以预测WEZ,确定系数为0.99。它提供了另一个与先前研究相关的程序,因为它采用了非离散化模型,即,它同时考虑了WEZ的所有方向,这是以前从未做过的。此外,所提出的方法使用了一种实验设计,允许更少的模拟运行,提供更快的模型训练。 摘要:This work investigates the use of a Deep Neural Network (DNN) to perform an estimation of the Weapon Engagement Zone (WEZ) maximum launch range. The WEZ allows the pilot to identify an airspace in which the available missile has a more significant probability of successfully engaging a particular target, i.e., a hypothetical area surrounding an aircraft in which an adversary is vulnerable to a shot. We propose an approach to determine the WEZ of a given missile using 50,000 simulated launches in variate conditions. These simulations are used to train a DNN that can predict the WEZ when the aircraft finds itself on different firing conditions, with a coefficient of determination of 0.99. It provides another procedure concerning preceding research since it employs a non-discretized model, i.e., it considers all directions of the WEZ at once, which has not been done previously. Additionally, the proposed method uses an experimental design that allows for fewer simulation runs, providing faster model training.

【4】 DVS: Deep Visibility Series and its Application in Construction Cost Index Forecasting 标题:DVS:深能见度序列及其在工程造价指数预测中的应用 链接:https://arxiv.org/abs/2111.04071

作者:Tianxiang Zhan,Yuanpeng He,Hanwen Li,Fuyuan Xiao 摘要:时间序列预测一直是科学研究的热点。随着人工智能的发展,新的时间序列预测方法通过仿生研究和对以往方法的改进,取得了更好的预测效果和预测性能。在以往的研究中,可视图(VG)算法常用于时间序列预测,但其预测效果不如人工神经网络(ANN)、卷积神经网络(CNN)和长短时记忆网络(LSTM)等深度学习预测方法。VG算法包含了丰富的网络信息,但以往的研究没有有效地利用网络信息进行预测,导致预测误差较大。为了解决这一问题,本文通过对VG的仿生设计和对以往研究的扩展,提出了深度可视系列(DVS)模块,这是首次将VG与仿生设计和深度网络相结合。将生物视觉仿生设计应用于VG,使DVS时间序列获得了较高的预测精度,为时间序列预测做出了贡献。同时,本文将DVS预测方法应用到工程造价指标预测中,具有一定的现实意义。 摘要:Time series forecasting has always been a hot spot in scientific research. With the development of artificial intelligence, new time series forecasting methods have obtained better forecasting effects and forecasting performance through bionic research and improvements to the past methods. Visibility Graph (VG) algorithm is often used for time series prediction in previous research, but the prediction effect is not as good as deep learning prediction methods such as Artificial Neural Network (ANN), Convolutional Neural Network (CNN) and Long Short-Term Memory Network (LSTM) prediction. The VG algorithm contains a wealth of network information, but previous studies did not effectively use the network information to make predictions, resulting in relatively large prediction errors. In order to solve this problem, this paper proposes the Deep Visibility Series (DVS) module through the bionic design of VG and the expansion of the past research, which is the first time to combine VG with bionic design and deep network. By applying the bionic design of biological vision to VG, the time series of DVS has obtained superior forecast accuracy, which has made a contribution to time series forecasting. At the same time, this paper applies the DVS forecasting method to the construction cost index forecast, which has practical significance.

【5】 Predictive Model for Gross Community Production Rate of Coral Reefs using Ensemble Learning Methodologies 标题:基于集成学习方法的珊瑚礁群落总生产力预测模型 链接:https://arxiv.org/abs/2111.04003

作者:Umanandini S,Aouthithiye Barathwaj SR Y,Jasline Augusta J,Shrirang Sapate,Reenasree S,Vigneash M 机构:C 备注:8 pages, 18 figures 摘要:珊瑚礁在维持海洋生态系统的生态平衡方面起着至关重要的作用。各种海洋生物依靠珊瑚礁生存和自然过程。珊瑚礁为海洋生态系统中各种外来物种的繁殖和生长提供了必要的栖息地。在本文中,我们讨论了影响珊瑚和珊瑚礁生命周期的最重要参数,如海洋酸化、脱氧和其他物理参数,如流速和表面积。海洋酸化取决于溶解二氧化碳(CO2)的数量。这是由于溶解的CO2气体与海洋中的碳酸钙化合物反应时释放出H+离子。脱氧是另一个导致缺氧的问题,其特点是水中溶解氧的数量少于海洋生物生存所需的数量。在这篇文章中,我们强调了物理参数的重要性,例如影响气体交换、散热、漂白敏感性、营养供应、喂食、废物和沉积物去除、生长和繁殖的流量。在本文中,我们还提出了这些重要参数,并提出了一个基于集成机器学习的模型来分析这些参数,并提供更好的速率,帮助我们理解和适当改善海洋组成,从而显著提高海洋生态系统(主要是珊瑚礁)的可持续性 摘要:Coral reefs play a vital role in maintaining the ecological balance of the marine ecosystem. Various marine organisms depend on coral reefs for their existence and their natural processes. Coral reefs provide the necessary habitat for reproduction and growth for various exotic species of the marine ecosystem. In this article, we discuss the most important parameters which influence the lifecycle of coral and coral reefs such as ocean acidification, deoxygenation and other physical parameters such as flow rate and surface area. Ocean acidification depends on the amount of dissolved Carbon dioxide (CO2). This is due to the release of H+ ions upon the reaction of the dissolved CO2 gases with the calcium carbonate compounds in the ocean. Deoxygenation is another problem that leads to hypoxia which is characterized by a lesser amount of dissolved oxygen in water than the required amount for the existence of marine organisms. In this article, we highlight the importance of physical parameters such as flow rate which influence gas exchange, heat dissipation, bleaching sensitivity, nutrient supply, feeding, waste and sediment removal, growth and reproduction. In this paper, we also bring out these important parameters and propose an ensemble machine learning-based model for analyzing these parameters and provide better rates that can help us to understand and suitably improve the ocean composition which in turn can eminently improve the sustainability of the marine ecosystem, mainly the coral reefs

【6】 Smooth tensor estimation with unknown permutations 标题:具有未知排列的光滑张量估计 链接:https://arxiv.org/abs/2111.04681

作者:Chanwoo Lee,Miaoyan Wang 机构:University of Wisconsin – Madison 备注:37 pages, 10 figures, 10 tables 摘要:我们考虑存在未知排列的结构张量去噪问题。此类数据问题通常出现在推荐系统、神经成像、社区检测和多路比较应用中。在这里,我们发展了一个光滑张量模型的一般族,它可以达到任意的指数排列;该模型结合了流行的张量块模型和Lipschitz超图模型作为特例。我们证明了分块多项式族中的约束最小二乘估计达到了极大极小误差界。揭示了最佳恢复所需的平滑度阈值的相变现象。特别是,我们发现高达$(m-2)(m+1)/2$的多项式足以精确恢复阶数-$m$张量,而高次则没有进一步的好处。这一现象揭示了具有和不具有未知置换的光滑张量估计问题的内在区别。此外,我们提供了一个有效的多项式时间Borda计数算法,该算法在单调性假设下可证明达到最优速率。通过模拟和芝加哥犯罪数据分析证明了我们的程序的有效性。 摘要:We consider the problem of structured tensor denoising in the presence of unknown permutations. Such data problems arise commonly in recommendation system, neuroimaging, community detection, and multiway comparison applications. Here, we develop a general family of smooth tensor models up to arbitrary index permutations; the model incorporates the popular tensor block models and Lipschitz hypergraphon models as special cases. We show that a constrained least-squares estimator in the block-wise polynomial family achieves the minimax error bound. A phase transition phenomenon is revealed with respect to the smoothness threshold needed for optimal recovery. In particular, we find that a polynomial of degree up to $(m-2)(m+1)/2$ is sufficient for accurate recovery of order-$m$ tensors, whereas higher degree exhibits no further benefits. This phenomenon reveals the intrinsic distinction for smooth tensor estimation problems with and without unknown permutations. Furthermore, we provide an efficient polynomial-time Borda count algorithm that provably achieves optimal rate under monotonicity assumptions. The efficacy of our procedure is demonstrated through both simulations and Chicago crime data analysis.

【7】 A Private and Computationally-Efficient Estimator for Unbounded Gaussians 标题:一种计算高效的无界高斯型私有估计器 链接:https://arxiv.org/abs/2111.04609

作者:Gautam Kamath,Argyris Mouzakis,Vikrant Singhal,Thomas Steinke,Jonathan Ullman 机构: University of Waterloo, Northeastern University 摘要:我们给出了任意高斯分布$\mathcal{N}(\mu,\Sigma)$in$\mathbb{R}^d$的均值和协方差的第一多项式时间、多项式样本、微分私有估计。以前的所有估计器要么是非结构化的,具有无限的运行时间,要么要求用户指定参数$\mu$和$\Sigma$的先验边界。我们算法中的主要新技术工具是一个新的微分私有预处理器,它从任意高斯$\mathcal{N}(0、\Sigma)$中采样,并返回矩阵$a$,使得$a\Sigma a^T$具有恒定的条件数。 摘要:We give the first polynomial-time, polynomial-sample, differentially private estimator for the mean and covariance of an arbitrary Gaussian distribution $\mathcal{N}(\mu,\Sigma)$ in $\mathbb{R}^d$. All previous estimators are either nonconstructive, with unbounded running time, or require the user to specify a priori bounds on the parameters $\mu$ and $\Sigma$. The primary new technical tool in our algorithm is a new differentially private preconditioner that takes samples from an arbitrary Gaussian $\mathcal{N}(0,\Sigma)$ and returns a matrix $A$ such that $A \Sigma A^T$ has constant condition number.

【8】 Simultaneous estimation of wall and object parameters in TWR using deep neural network 标题:基于深度神经网络的行波雷达壁面参数和目标参数的同时估计 链接:https://arxiv.org/abs/2111.04568

作者:Fardin Ghorbani,Hossein Soleimani 机构: are with the School of Electrical Engineering, Iran University of Science and Technology 摘要:本文提出了一种在穿墙雷达中同时估计目标和墙参数的深度学习模型。在这项工作中,我们考虑两种模式:单目标和两个目标。在这两种情况下,我们考虑的壁的介电常数和厚度,以及目标的中心和介电常数的二维坐标。这意味着,在单个目标的情况下,我们估计五个值,而在两个目标的情况下,我们同时估计八个值,每个值代表上述参数。我们发现,当使用深度神经网络解决目标定位问题时,给模型更多的问题参数可以提高定位精度。因此,我们在问题中加入了两个壁面参数,并发现在估计壁面参数的同时,目标定位的精度得到了提高。通过使用深度神经网络模型,我们能够以99%的精度估计单靶和双靶模式下的壁介电常数和厚度参数,以及目标的二维坐标和介电常数。 摘要:This paper presents a deep learning model for simultaneously estimating target and wall parameters in Through-the-Wall Radar. In this work, we consider two modes: single-target and two-targets. In both cases, we consider the permittivity and thickness for the wall, as well as the two-dimensional coordinates of the target's center and permittivity. This means that in the case of a single target, we estimate five values, whereas, in the case of two targets, we estimate eight values simultaneously, each of which represents the mentioned parameters. We discovered that when using deep neural networks to solve the target locating problem, giving the model more parameters of the problem increases the location accuracy. As a result, we included two wall parameters in the problem and discovered that the accuracy of target locating improves while the wall parameters are estimated. We were able to estimate the parameters of wall permittivity and thickness, as well as two-dimensional coordinates and permittivity of targets in single-target and two-target modes with 99\% accuracy by using a deep neural network model.

【9】 AI challenges for predicting the impact of mutations on protein stability 标题:预测突变对蛋白质稳定性影响的人工智能挑战 链接:https://arxiv.org/abs/2111.04208

作者:Fabrizio Pucci,Martin Schwersensky,Marianne Rooman 机构:Computational Biology and Bioinformatics, Universit´e Libre de Bruxelles, Brussels, Belgium, Interuniversity Institute of Bioinformatics in Brussels 摘要:稳定性是蛋白质适宜性的关键因素,通过靶向突变对其进行修饰在蛋白质工程、药物设计和有害变异解释等领域有着广泛的应用。在过去几十年中,许多研究都致力于基于人工智能(AI)的最新发展,建立新的、更有效的方法来预测突变对蛋白质稳定性的影响。我们讨论了它们的特性、算法、计算效率和在独立测试集上估计的精度。我们重点分析了它们的局限性、对训练集的反复偏见、它们的普遍性和可解释性。我们发现,15年来,预测值的准确度一直停留在1 kcal/mol左右。最后,我们讨论了需要解决的挑战,以提高绩效。 摘要:Stability is a key ingredient of protein fitness and its modification through targeted mutations has applications in various fields such as protein engineering, drug design and deleterious variant interpretation. Many studies have been devoted over the past decades to building new, more effective methods for predicting the impact of mutations on protein stability, based on the latest developments in artificial intelligence (AI). We discuss their features, algorithms, computational efficiency, and accuracy estimated on an independent test set. We focus on a critical analysis of their limitations, the recurrent biases towards the training set, their generalizability and interpretability. We found that the accuracy of the predictors has stagnated at around 1 kcal/mol for over 15 years. We conclude by discussing the challenges that need to be addressed to reach improved performance.

【10】 Damage Estimation and Localization from Sparse Aerial Imagery 标题:稀疏航空影像的损伤估计与定位 链接:https://arxiv.org/abs/2111.03708

作者:Rene Garcia Franceschini,Jeffrey Liu,Saurabh Amin 机构:MIT Lincoln Laboratory, Lexington, MA, Dept. of Civil and Environmental Engineering, Massachusetts Institute of Technology, Cambridge, MA 摘要:航空图像为应对飓风等自然灾害提供了重要的态势感知。它们非常适合为损伤估计和定位(DEL)提供信息;i、 例如,描述灾难后损害的类型和空间范围。尽管最近传感和无人驾驶航空系统技术取得了进步,但许多灾后航空图像仍然是由小型、有人驾驶的固定翼飞机上的手持单反相机拍摄的。然而,这些手持摄像机缺乏IMU信息,操作员在事件发生后会机会主义地拍摄图像。因此,从这些图像中删除数据仍然是一个高度手工和耗时的过程。我们提出了一种在航空图像中检测损害和在世界坐标中定位损害的方法,特别侧重于检测和定位洪水。该方法基于使用运动结构通过投影变换将图像坐标与世界坐标关联起来,使用类激活映射检测图像中的损伤程度,并应用投影变换在世界坐标中定位损伤。我们根据2016年路易斯安那州洪水的事后数据评估了我们的方法的性能,发现我们的方法达到了88%的精度。鉴于使用有限数据的高精度,我们认为这种方法目前是可行的,可以快速有效地从手持航空图像中提取灾难响应数据。 摘要:Aerial images provide important situational awareness for responding to natural disasters such as hurricanes. They are well-suited for providing information for damage estimation and localization (DEL); i.e., characterizing the type and spatial extent of damage following a disaster. Despite recent advances in sensing and unmanned aerial systems technology, much of post-disaster aerial imagery is still taken by handheld DSLR cameras from small, manned, fixed-wing aircraft. However, these handheld cameras lack IMU information, and images are taken opportunistically post-event by operators. As such, DEL from such imagery is still a highly manual and time-consuming process. We propose an approach to both detect damage in aerial images and localize it in world coordinates, with specific focus on detecting and localizing flooding. The approach is based on using structure from motion to relate image coordinates to world coordinates via a projective transformation, using class activation mapping to detect the extent of damage in an image, and applying the projective transformation to localize damage in world coordinates. We evaluate the performance of our approach on post-event data from the 2016 Louisiana floods, and find that our approach achieves a precision of 88%. Given this high precision using limited data, we argue that this approach is currently viable for fast and effective DEL from handheld aerial imagery for disaster response.

【11】 Predicting Mortality from Credit Reports 标题:从信用报告预测死亡率 链接:https://arxiv.org/abs/2111.03662

作者:Giacomo De Giorgi,Matthew Harding,Gabriel Vasconcelos 机构:Geneva School of Economics and Management, University of Geneva, Department of Economics, University of California, Irvine, Gabriel F. R. Vasconcelos 摘要:许多国家定期收集与个人消费金融行为(如信用卡和贷款活动)相关的数百个变量的数据,这些数据在贷款决策中发挥着重要作用。我们假设,这些数据的详细性质可用于预测看似无关的领域(如个人健康)的结果。我们建立了一系列机器学习模型来证明信用报告数据可以用来预测个人死亡率。与信用卡和各种贷款(主要是无担保贷款)相关的变量组具有显著的预测能力。这些变量的滞后也很显著,因此表明动力学也很重要。基于消费者金融数据的死亡率预测的改进可能对保险市场产生重要的经济影响,但也可能引起隐私问题。 摘要:Data on hundreds of variables related to individual consumer finance behavior (such as credit card and loan activity) is routinely collected in many countries and plays an important role in lending decisions. We postulate that the detailed nature of this data may be used to predict outcomes in seemingly unrelated domains such as individual health. We build a series of machine learning models to demonstrate that credit report data can be used to predict individual mortality. Variable groups related to credit cards and various loans, mostly unsecured loans, are shown to carry significant predictive power. Lags of these variables are also significant thus indicating that dynamics also matters. Improved mortality predictions based on consumer finance data can have important economic implications in insurance markets but may also raise privacy concerns.

其他神经网络|深度学习|模型|建模(30篇)

【1】 Efficiently Learning Any One Hidden Layer ReLU Network From Queries 标题:从查询中高效学习任意一个隐层RELU网络 链接:https://arxiv.org/abs/2111.04727

作者:Sitan Chen,Adam R Klivans,Raghu Meka 机构:MIT, Adam R. Klivans†, UT Austin, UCLA 备注:To appear in Advances in Neural Information Processing Systems (NeurIPS 2021) 摘要:模型提取攻击重新引起了人们对从查询中学习神经网络这一经典问题的兴趣。在这项工作中,我们给出了第一个多项式时间算法学习任意一个隐层神经网络激活提供了黑盒访问网络。在形式上,我们证明了如果$F$是一个任意的单隐层神经网络,具有ReLU激活,则存在一个查询复杂度和运行时间在所有参数中都是多项式的算法,该算法输出一个网络$F'$,相对于高斯测度的$F$实现了较低的平方损失。虽然安全文献中的大量工作已经提出并实证证明了某些算法对该问题的有效性,但我们的算法是第一个完全多项式时间保证效率的算法,即使是在最坏情况下的网络(特别是我们的算法在过参数化设置中成功)。 摘要:Model extraction attacks have renewed interest in the classic problem of learning neural networks from queries. In this work we give the first polynomial-time algorithm for learning arbitrary one hidden layer neural networks activations provided black-box access to the network. Formally, we show that if $F$ is an arbitrary one hidden layer neural network with ReLU activations, there is an algorithm with query complexity and running time that is polynomial in all parameters that outputs a network $F'$ achieving low square loss relative to $F$ with respect to the Gaussian measure. While a number of works in the security literature have proposed and empirically demonstrated the effectiveness of certain algorithms for this problem, ours is the first with fully polynomial-time guarantees of efficiency even for worst-case networks (in particular our algorithm succeeds in the overparameterized setting).

【2】 SustainBench: Benchmarks for Monitoring the Sustainable Development Goals with Machine Learning 标题:可持续本奇:用机器学习监测可持续发展目标的基准 链接:https://arxiv.org/abs/2111.04724

作者:Christopher Yeh,Chenlin Meng,Sherrie Wang,Anne Driscoll,Erik Rozi,Patrick Liu,Jihyeon Lee,Marshall Burke,David B. Lobell,Stefano Ermon 机构:Caltech, Stanford, UC Berkeley, David Lobell 备注:NeurIPS 2021 (Track on Datasets and Benchmarks) 摘要:联合国可持续发展目标(SDG)的进展因缺乏关键环境和社会经济指标的数据而受到阻碍,这些指标历史上都来自于时间和空间覆盖较少的地面调查。机器学习的最新进展使得利用卫星或社交媒体等丰富、频繁更新且全球可用的数据来洞察可持续发展目标的进展成为可能。尽管早期结果很有希望,但迄今为止,使用此类数据进行可持续发展目标测量的方法在很大程度上是在不同的数据集上进行评估,或者使用不一致的评估指标,这使得人们很难理解绩效是否在改善,以及在哪些地方进行额外的研究最有成效。此外,处理卫星和地面调查数据需要机器学习社区中许多人所缺乏的领域知识。在本文中,我们介绍SustainBench,这是一个涵盖7项可持续发展目标的15项基准任务的集合,包括与经济发展、农业、卫生、教育、水和卫生、气候行动和陆地生活相关的任务。15项任务中有11项任务的数据集首次公开发布。SustainBench的目标是:(1)降低机器学习社区的进入壁垒,为衡量和实现可持续发展目标做出贡献;(2) 为评估各种SDG任务的机器学习模型提供标准基准;(3)鼓励开发新的机器学习方法,改进模型性能有助于实现可持续发展目标。 摘要:Progress toward the United Nations Sustainable Development Goals (SDGs) has been hindered by a lack of data on key environmental and socioeconomic indicators, which historically have come from ground surveys with sparse temporal and spatial coverage. Recent advances in machine learning have made it possible to utilize abundant, frequently-updated, and globally available data, such as from satellites or social media, to provide insights into progress toward SDGs. Despite promising early results, approaches to using such data for SDG measurement thus far have largely evaluated on different datasets or used inconsistent evaluation metrics, making it hard to understand whether performance is improving and where additional research would be most fruitful. Furthermore, processing satellite and ground survey data requires domain knowledge that many in the machine learning community lack. In this paper, we introduce SustainBench, a collection of 15 benchmark tasks across 7 SDGs, including tasks related to economic development, agriculture, health, education, water and sanitation, climate action, and life on land. Datasets for 11 of the 15 tasks are released publicly for the first time. Our goals for SustainBench are to (1) lower the barriers to entry for the machine learning community to contribute to measuring and achieving the SDGs; (2) provide standard benchmarks for evaluating machine learning models on tasks across a variety of SDGs; and (3) encourage the development of novel machine learning methods where improved model performance facilitates progress towards the SDGs.

【3】 SMU: smooth activation function for deep networks using smoothing maximum technique 标题:SMU:基于平滑极大值技术的深层网络平滑激活函数 链接:https://arxiv.org/abs/2111.04682

作者:Koushik Biswas,Sandeep Kumar,Shilpak Banerjee,Ashish Kumar Pandey 备注:7 pages 摘要:深度学习研究人员对提出两种新的激活函数非常感兴趣,它们可以提高网络性能。选择一个好的激活函数可以在提高网络性能方面产生重大影响。人工激活是神经网络模型中最常见的选择。尽管ReLU有一些严重的缺点,但由于其简单性,ReLU是深度学习社区中最常见的选择。在本文中,我们提出了一种新的激活函数,该函数基于对已知激活函数(如泄漏ReLU)的近似,我们称之为平滑最大单位(SMU)。使用SMU替换ReLU后,使用ShuffleNet V2模型的CIFAR100数据集的性能提高了6.22%。 摘要:Deep learning researchers have a keen interest in proposing two new novel activation functions which can boost network performance. A good choice of activation function can have significant consequences in improving network performance. A handcrafted activation is the most common choice in neural network models. ReLU is the most common choice in the deep learning community due to its simplicity though ReLU has some serious drawbacks. In this paper, we have proposed a new novel activation function based on approximation of known activation functions like Leaky ReLU, and we call this function Smooth Maximum Unit (SMU). Replacing ReLU by SMU, we have got 6.22% improvement in the CIFAR100 dataset with the ShuffleNet V2 model.

【4】 Approximate Neural Architecture Search via Operation Distribution Learning 标题:基于操作分布学习的近似神经结构搜索 链接:https://arxiv.org/abs/2111.04670

作者:Xingchen Wan,Binxin Ru,Pedro M. Esperança,Fabio M. Carlucci 机构:Pedro M. Esperanc¸a, Huawei Noah’s Ark Lab, London, UK, Machine Learning Research Group, University of Oxford, Oxford, UK 备注:WACV 2022. 10 pages, 3 figures and 5 tables (15 pages, 7 figures and 6 tables including appendices) 摘要:神经架构搜索(NAS)的标准范例是搜索具有特定操作和连接的完全确定性架构。在这项工作中,我们提出搜索最优操作分布,从而提供一个随机的近似解,可用于任意长度的结构样本。我们提出并表明,给定一个架构单元,其性能在很大程度上取决于所使用操作的比率,而不是典型搜索空间中的任何特定连接模式;也就是说,操作顺序的微小变化通常是无关紧要的。这种直觉与任何特定的搜索策略都是正交的,可以应用于各种NAS算法。通过对4个数据集和4种NAS技术(贝叶斯优化、可微搜索、局部搜索和随机搜索)的广泛验证,我们表明操作分布(1)具有足够的辨别能力,能够可靠地识别解决方案,(2)比传统编码更容易优化,在性能上几乎没有成本的情况下实现大的速度提升。事实上,这种简单的直觉大大降低了当前方法的成本,并有可能使NAS在更广泛的应用中得到应用。 摘要:The standard paradigm in Neural Architecture Search (NAS) is to search for a fully deterministic architecture with specific operations and connections. In this work, we instead propose to search for the optimal operation distribution, thus providing a stochastic and approximate solution, which can be used to sample architectures of arbitrary length. We propose and show, that given an architectural cell, its performance largely depends on the ratio of used operations, rather than any specific connection pattern in typical search spaces; that is, small changes in the ordering of the operations are often irrelevant. This intuition is orthogonal to any specific search strategy and can be applied to a diverse set of NAS algorithms. Through extensive validation on 4 data-sets and 4 NAS techniques (Bayesian optimisation, differentiable search, local search and random search), we show that the operation distribution (1) holds enough discriminating power to reliably identify a solution and (2) is significantly easier to optimise than traditional encodings, leading to large speed-ups at little to no cost in performance. Indeed, this simple intuition significantly reduces the cost of current approaches and potentially enable NAS to be used in a broader range of applications.

【5】 DeepSteal: Advanced Model Extractions Leveraging Efficient Weight Stealing in Memories 标题:DeepSteal:高级模型提取,利用记忆中有效的重量窃取 链接:https://arxiv.org/abs/2111.04625

作者:Adnan Siraj Rakin,Md Hafizul Islam Chowdhuryy,Fan Yao,Deliang Fan 机构: Co-First Authors with Equal Contributions, Department of Electrical,Computer and Energy Engineering, Arizona State University, Department of Electrical and Computer Engineering, University of Central Florida 摘要:深度神经网络(DNN)的最新发展在多个安全敏感领域得到了广泛的应用。资源密集型训练的需要和对有价值的特定领域训练数据的使用使这些模型成为模型所有者的顶级知识产权(IP)。DNN隐私的主要威胁之一是模型提取攻击,对手试图窃取DNN模型中的敏感信息。最近的研究表明,基于硬件的侧通道攻击可以揭示关于DNN模型(例如模型架构)的内部知识,但是,迄今为止,现有攻击无法提取详细的模型参数(例如权重/偏差)。在这项工作中,我们首次提出了一种先进的模型提取攻击框架DeepSteal,该框架借助内存侧通道攻击有效地窃取DNN权重。我们提议的DeepSteal包括两个关键阶段。首先,采用基于rowhammer的硬件故障技术作为信息泄漏向量,提出了一种新的权值位信息提取方法HammerLeak。HammerLeak利用了几种针对DNN应用的新型系统级技术,以实现快速高效的重量窃取。其次,我们提出了一种新的具有平均聚类权重惩罚的替代模型训练算法,该算法有效地利用了部分泄漏的比特信息,并生成了目标受害者模型的替代原型。我们在三个流行的图像数据集(如CIFAR-10/100/GTSRB)和四个DNN体系结构(如ResNet-18/34/Wide ResNet/VGG-11)上评估了这种替代模型提取方法。提取的替代模型在CIFAR-10数据集的深度残差网络上已成功实现90%以上的测试精度。此外,我们提取的替代模型还可以生成有效的对抗性输入样本来愚弄受害者模型。 摘要:Recent advancements of Deep Neural Networks (DNNs) have seen widespread deployment in multiple security-sensitive domains. The need of resource-intensive training and use of valuable domain-specific training data have made these models a top intellectual property (IP) for model owners. One of the major threats to the DNN privacy is model extraction attacks where adversaries attempt to steal sensitive information in DNN models. Recent studies show hardware-based side channel attacks can reveal internal knowledge about DNN models (e.g., model architectures) However, to date, existing attacks cannot extract detailed model parameters (e.g., weights/biases). In this work, for the first time, we propose an advanced model extraction attack framework DeepSteal that effectively steals DNN weights with the aid of memory side-channel attack. Our proposed DeepSteal comprises two key stages. Firstly, we develop a new weight bit information extraction method, called HammerLeak, through adopting the rowhammer based hardware fault technique as the information leakage vector. HammerLeak leverages several novel system-level techniques tailed for DNN applications to enable fast and efficient weight stealing. Secondly, we propose a novel substitute model training algorithm with Mean Clustering weight penalty, which leverages the partial leaked bit information effectively and generates a substitute prototype of the target victim model. We evaluate this substitute model extraction method on three popular image datasets (e.g., CIFAR-10/100/GTSRB) and four DNN architectures (e.g., ResNet-18/34/Wide-ResNet/VGG-11). The extracted substitute model has successfully achieved more than 90 % test accuracy on deep residual networks for the CIFAR-10 dataset. Moreover, our extracted substitute model could also generate effective adversarial input samples to fool the victim model.

【6】 Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems 标题:稀疏报酬合作多Agent问题的变分自动课程学习 链接:https://arxiv.org/abs/2111.04613

作者:Jiayu Chen,Yuanxin Zhang,Yuanfan Xu,Huimin Ma,Huazhong Yang,Jiaming Song,Yu Wang,Yi Wu 机构: Tsinghua University, Shanghai Qi Zhi Institute, University of Science and Technology Beijing, Stanford University 备注:In NeurIPS 2021 摘要:我们介绍了一种课程学习算法,变分自动课程学习(VACL),用于解决具有挑战性目标条件的多智能体协作强化学习问题。我们通过一个变化的视角来激励我们的范式,其中学习目标可以分解为两个术语:当前任务分布的任务学习和新任务分布的课程更新。第二学期的局部优化建议课程应逐步将训练任务从容易扩展到困难。我们的VACL算法通过两个实际组件(任务扩展和实体推进)实现了这种变分范式,这两个组件在任务配置和任务中实体的数量上生成了训练课程。实验结果表明,VACL解决了一系列具有大量代理的稀疏奖励问题。特别是,使用单个台式机,VACL在简单扩展基准中具有100个代理,覆盖98%个覆盖率,并再现最初在OpenAI的捉迷藏项目中显示的RAMP使用行为。我们的项目网站位于https://sites.google.com/view/vacl-neurips-2021. 摘要:We introduce a curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multi-agent reinforcement learning problems. We motivate our paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current task distribution, and curriculum update to a new task distribution. Local optimization over the second term suggests that the curriculum should gradually expand the training tasks from easy to hard. Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity progression, which produces training curricula over both the task configurations as well as the number of entities in the task. Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents. Particularly, using a single desktop machine, VACL achieves 98% coverage rate with 100 agents in the simple-spread benchmark and reproduces the ramp-use behavior originally shown in OpenAI's hide-and-seek project. Our project website is at https://sites.google.com/view/vacl-neurips-2021.

【7】 On the Stochastic Stability of Deep Markov Models 标题:关于深度马尔可夫模型的随机稳定性 链接:https://arxiv.org/abs/2111.04601

作者:Ján Drgoňa,Sayak Mukherjee,Jiaxin Zhang,Frank Liu,Mahantesh Halappanavar 机构: Pacific Northwest National Laboratory, Richland, Washington, USA, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA 备注:35th Conference on Neural Information Processing Systems (NeurIPS 2021), Sydney, Australia 摘要:深度马尔可夫模型(DMM)是一种生成模型,它是马尔可夫模型在表示、学习和推理问题上的可扩展和表达性推广。然而,这类模型的基本随机稳定性保证尚未得到彻底的研究。在本文中,我们给出了动态系统中DMM随机稳定性的充分条件,并提出了一种基于深度神经网络建模的概率映射收缩的稳定性分析方法。我们将神经网络权值的频谱特性与所使用的不同类型的激活函数联系起来,研究了高斯分布DMM的稳定性和整体动态行为。基于这一理论,我们提出了一些实用的方法来设计具有保证稳定性的约束DMM。我们通过使用所提出的稳定性约束进行直观的数值实验来验证我们的理论结果。 摘要:Deep Markov models (DMM) are generative models that are scalable and expressive generalization of Markov models for representation, learning, and inference problems. However, the fundamental stochastic stability guarantees of such models have not been thoroughly investigated. In this paper, we provide sufficient conditions of DMM's stochastic stability as defined in the context of dynamical systems and propose a stability analysis method based on the contraction of probabilistic maps modeled by deep neural networks. We make connections between the spectral properties of neural network's weights and different types of used activation functions on the stability and overall dynamic behavior of DMMs with Gaussian distributions. Based on the theory, we propose a few practical methods for designing constrained DMMs with guaranteed stability. We empirically substantiate our theoretical results via intuitive numerical experiments using the proposed stability constraints.

【8】 Information-Theoretic Bayes Risk Lower Bounds for Realizable Models 标题:可实现模型的信息论Bayes风险下界 链接:https://arxiv.org/abs/2111.04579

作者:Matthew Nokleby,Ahmad Beirami 机构:Lily AI, Mountain View, CA, Facebook AI, Menlo Park, CA 摘要:我们推导了可实现机器学习模型的贝叶斯风险和泛化误差的信息论下界。特别是,我们采用了一种分析方法,其中模型参数的率失真函数限制了训练样本和模型参数之间所需的互信息,以便将模型学习到贝叶斯风险约束。对于可实现的模型,我们证明了率失真函数和互信息都包含便于分析的表达式。对于参数(大致)较低的Lipschitz模型,我们从下面定义率失真函数,而对于VC类,互信息在上面由$d\mathrm{VC}\log(n)$定义。当这些条件匹配时,关于零一损失的Bayes风险标度不快于$\Omega(d_\mathrm{vc}/n)$,它匹配已知的外界和对数因子的极大极小下界。我们也考虑标签噪声的影响,当训练和/或测试样本被破坏时提供下限。 摘要:We derive information-theoretic lower bounds on the Bayes risk and generalization error of realizable machine learning models. In particular, we employ an analysis in which the rate-distortion function of the model parameters bounds the required mutual information between the training samples and the model parameters in order to learn a model up to a Bayes risk constraint. For realizable models, we show that both the rate distortion function and mutual information admit expressions that are convenient for analysis. For models that are (roughly) lower Lipschitz in their parameters, we bound the rate distortion function from below, whereas for VC classes, the mutual information is bounded above by $d_\mathrm{vc}\log(n)$. When these conditions match, the Bayes risk with respect to the zero-one loss scales no faster than $\Omega(d_\mathrm{vc}/n)$, which matches known outer bounds and minimax lower bounds up to logarithmic factors. We also consider the impact of label noise, providing lower bounds when training and/or test samples are corrupted.

【9】 Improved Regularization and Robustness for Fine-tuning in Neural Networks 标题:神经网络精调的改进正则化和鲁棒性 链接:https://arxiv.org/abs/2111.04578

作者:Dongyue Li,Hongyang R. Zhang 备注:22 pages, 6 figures, 11 tables 摘要:一种广泛使用的迁移学习算法是微调,即在目标任务上使用少量标记数据对预先训练的模型进行微调。当预训练模型的容量远远大于目标数据集的大小时,微调容易过度拟合和“记忆”训练标签。因此,一个重要的问题是正则化微调并确保其对噪声的鲁棒性。为了解决这个问题,我们首先分析微调的泛化特性。我们提出了一个PAC Bayes泛化界,它取决于微调过程中每一层的移动距离和微调模型的噪声稳定性。我们根据经验测量这些量。在分析的基础上,我们提出了正则化自标记——正则化和自标记方法之间的插值,包括(i)分层正则化以约束每层中的移动距离;(ii)自标记校正和标签重新称重,以纠正错误标记的数据点(模型有信心)和重新称重不太自信的数据点。我们使用多个预先训练好的模型架构,在大量图像和文本数据集上验证了我们的方法。对于七个图像分类任务,我们的方法将基线方法提高了1.76%(平均),对于一些镜头分类任务,我们的方法提高了0.75%。当目标数据集包含噪声标签时,在两种噪声环境下,我们的方法比基线方法平均高3.56%。 摘要:A widely used algorithm for transfer learning is fine-tuning, where a pre-trained model is fine-tuned on a target task with a small amount of labeled data. When the capacity of the pre-trained model is much larger than the size of the target data set, fine-tuning is prone to overfitting and "memorizing" the training labels. Hence, an important question is to regularize fine-tuning and ensure its robustness to noise. To address this question, we begin by analyzing the generalization properties of fine-tuning. We present a PAC-Bayes generalization bound that depends on the distance traveled in each layer during fine-tuning and the noise stability of the fine-tuned model. We empirically measure these quantities. Based on the analysis, we propose regularized self-labeling -- the interpolation between regularization and self-labeling methods, including (i) layer-wise regularization to constrain the distance traveled in each layer; (ii) self label-correction and label-reweighting to correct mislabeled data points (that the model is confident) and reweight less confident data points. We validate our approach on an extensive collection of image and text data sets using multiple pre-trained model architectures. Our approach improves baseline methods by 1.76% (on average) for seven image classification tasks and 0.75% for a few-shot classification task. When the target data set includes noisy labels, our approach outperforms baseline methods by 3.56% on average in two noisy settings.

【10】 Addressing Privacy Threats from Machine Learning 标题:应对来自机器学习的隐私威胁 链接:https://arxiv.org/abs/2111.04439

作者:Mary Anne Smart 机构:Department of Computer Science & Engineering, University of California San Diego 备注:3 pages. Human Centered AI Workshop @ NeurIPS 2021 accepted submission 摘要:每年在NeurIPS,机器学习研究人员都会聚集一堂,讨论机器学习在公共卫生、灾难应对、气候变化、教育等领域令人兴奋的应用。然而,这些研究人员中的许多人对机器学习在监控中的应用表示越来越关注(Nanayakkara等人,2021年)。本文简要概述了抵制这些监视技术的策略,并呼吁机器学习和人机交互研究人员加强合作,以应对这些技术带来的威胁。 摘要:Every year at NeurIPS, machine learning researchers gather and discuss exciting applications of machine learning in areas such as public health, disaster response, climate change, education, and more. However, many of these same researchers are expressing growing concern about applications of machine learning for surveillance (Nanayakkara et al., 2021). This paper presents a brief overview of strategies for resisting these surveillance technologies and calls for greater collaboration between machine learning and human-computer interaction researchers to address the threats that these technologies pose.

【11】 SEOFP-NET: Compression and Acceleration of Deep Neural Networks for Speech Enhancement Using Sign-Exponent-Only Floating-Points 标题:SEOFP-NET:基于符号指数浮点的深层神经网络语音增强压缩与加速 链接:https://arxiv.org/abs/2111.04436

作者:Yu-Chen Lin,Cheng Yu,Yi-Te Hsu,Szu-Wei Fu,Yu Tsao,Tei-Wei Kuo 摘要:众多的压缩和加速策略在计算机视觉和语音信号处理等领域的分类任务中取得了显著的效果。然而,同样的策略在回归任务上产生了未批准的性能,因为这些任务和分类任务之间的性质不同。本文提出了一种新的仅符号指数浮点网络(SEOFP-NET)技术,用于压缩语音信号处理的回归任务——语音增强的模型大小和加快推理时间。该方法通过在训练过程中量化单精度浮点参数的分数位来压缩基于深度神经网络(DNN)的语音增强模型的大小。在推理实现之前,通过将浮点乘法器替换为整数加法器,对训练好的SEOFP-NET模型中的所有参数进行微调,以加快推理时间。为了推广,SEOFP-NET技术被引入到不同语料库下不同模型结构的语音信号处理中的不同语音增强任务中。实验结果表明,SEOFP-NET模型的大小可以显著压缩81.249%,而不会显著降低其语音增强性能,推理时间可以比基线模型加快1.212倍。结果还验证了所提出的SEOFP-NET可以与其他效率策略合作,实现模型压缩的协同效应。此外,在用户研究实验中应用了刚刚显著差异(JND),统计分析了语音增强对听力的影响。结果表明,听者无法轻易区分基线模型处理的增强语音信号和建议的SEOFP-NET。 摘要:Numerous compression and acceleration strategies have achieved outstanding results on classification tasks in various fields, such as computer vision and speech signal processing. Nevertheless, the same strategies have yielded ungratified performance on regression tasks because the nature between these and classification tasks differs. In this paper, a novel sign-exponent-only floating-point network (SEOFP-NET) technique is proposed to compress the model size and accelerate the inference time for speech enhancement, a regression task of speech signal processing. The proposed method compressed the sizes of deep neural network (DNN)-based speech enhancement models by quantizing the fraction bits of single-precision floating-point parameters during training. Before inference implementation, all parameters in the trained SEOFP-NET model are slightly adjusted to accelerate the inference time by replacing the floating-point multiplier with an integer-adder. For generalization, the SEOFP-NET technique is introduced to different speech enhancement tasks in speech signal processing with different model architectures under various corpora. The experimental results indicate that the size of SEOFP-NET models can be significantly compressed by up to 81.249% without noticeably downgrading their speech enhancement performance, and the inference time can be accelerated to 1.212x compared with the baseline models. The results also verify that the proposed SEOFP-NET can cooperate with other efficiency strategies to achieve a synergy effect for model compression. In addition, the just noticeable difference (JND) was applied to the user study experiment to statistically analyze the effect of speech enhancement on listening. The results indicate that the listeners cannot facilely differentiate between the enhanced speech signals processed by the baseline model and the proposed SEOFP-NET.

【12】 Off-policy Imitation Learning from Visual Inputs 标题:基于视觉输入的非策略模仿学习 链接:https://arxiv.org/abs/2111.04345

作者:Zhihao Cheng,Li Shen,Dacheng Tao 机构:JD Explore Academy & The University of Sydney 摘要:最近,在模仿学习(IL)中利用专家状态的各种成功应用已经被见证。然而,另一个IL设置——来自视觉输入的IL(ILfVI),通过利用在线视觉资源在现实中有更大的应用前景,由于策略学习方式和高维视觉输入,数据效率低,性能差。我们提出了OPIfVI(来自视觉输入的非策略模仿),它由非策略学习方式、数据扩充和编码器技术组成,分别解决上述挑战。更具体地说,为了提高数据效率,OPIfVI以非策略的方式进行IL,通过这种方式可以多次使用采样数据。此外,我们还通过频谱归一化增强了OPIfVI的稳定性,以减轻政策外训练的副作用。我们认为,导致ILfVI性能不佳的核心因素是代理无法从视觉输入中提取有意义的特征。因此,OPIfVI利用计算机视觉的数据增强来帮助训练编码器,以便更好地从视觉输入中提取特征。此外,针对编码器设计了一种特殊的梯度反向传播结构,以稳定编码器训练。最后,我们通过使用DeepMind Control Suite进行的大量实验证明,无论是视觉演示还是视觉观察,OPIfVI都能够实现专家级性能,并优于现有基线。 摘要:Recently, various successful applications utilizing expert states in imitation learning (IL) have been witnessed. However, another IL setting -- IL from visual inputs (ILfVI), which has a greater promise to be applied in reality by utilizing online visual resources, suffers from low data-efficiency and poor performance resulted from an on-policy learning manner and high-dimensional visual inputs. We propose OPIfVI (Off-Policy Imitation from Visual Inputs), which is composed of an off-policy learning manner, data augmentation, and encoder techniques, to tackle the mentioned challenges, respectively. More specifically, to improve data-efficiency, OPIfVI conducts IL in an off-policy manner, with which sampled data can be used multiple times. In addition, we enhance the stability of OPIfVI with spectral normalization to mitigate the side-effect of off-policy training. The core factor, contributing to the poor performance of ILfVI, that we think is the agent could not extract meaningful features from visual inputs. Hence, OPIfVI employs data augmentation from computer vision to help train encoders that can better extract features from visual inputs. In addition, a specific structure of gradient backpropagation for the encoder is designed to stabilize the encoder training. At last, we demonstrate that OPIfVI is able to achieve expert-level performance and outperform existing baselines no matter visual demonstrations or visual observations are provided via extensive experiments using DeepMind Control Suite.

【13】 Assessing learned features of Deep Learning applied to EEG 标题:深度学习的学习特征评估在脑电中的应用 链接:https://arxiv.org/abs/2111.04309

作者:Dung Truong,Scott Makeig,Arnaud Delorme 机构:SCCN, INC, UCSD, La Jolla CA, USA 摘要:卷积神经网络(CNN)在许多与计算机视觉相关的任务上取得了令人印象深刻的性能,如目标检测、图像识别、图像检索等。这些成就得益于CNN在学习具有深层神经元结构和迭代训练过程的鉴别特征方面的卓越能力。这启发了EEG研究界采用CNN来执行EEG分类任务。然而,CNN学习到的特征无法立即解释,导致对CNN的内部工作机制缺乏了解。为了提高CNN的可解释性,应用CNN可视化方法将内部特征转化为可视模式,用于CNN层的定性分析。计算机视觉文献中提出了许多CNN可视化方法来解释CNN网络的结构、操作和语义概念,但在脑电数据分析中的应用受到限制。在这项工作中,我们使用3种不同的方法从训练于原始EEG数据的CNN中提取EEG相关特征:每个分类类别的最佳样本、激活最大化和反向卷积。我们将这些方法应用于一个高性能的深度学习模型,该模型具有脑电性别分类任务的最新性能,并且表明该模型在θ频带上具有差异。我们表明,可视化的CNN模型可以揭示有趣的脑电图结果。使用这些工具,EEG研究人员使用深度学习可以更好地识别学习到的EEG特征,可能识别新的类别相关生物标记物。 摘要:Convolutional Neural Networks (CNNs) have achieved impressive performance on many computer vision related tasks, such as object detection, image recognition, image retrieval, etc. These achievements benefit from the CNNs' outstanding capability to learn discriminative features with deep layers of neuron structures and iterative training process. This has inspired the EEG research community to adopt CNN in performing EEG classification tasks. However, CNNs learned features are not immediately interpretable, causing a lack of understanding of the CNNs' internal working mechanism. To improve CNN interpretability, CNN visualization methods are applied to translate the internal features into visually perceptible patterns for qualitative analysis of CNN layers. Many CNN visualization methods have been proposed in the Computer Vision literature to interpret the CNN network structure, operation, and semantic concept, yet applications to EEG data analysis have been limited. In this work we use 3 different methods to extract EEG-relevant features from a CNN trained on raw EEG data: optimal samples for each classification category, activation maximization, and reverse convolution. We applied these methods to a high-performing Deep Learning model with state-of-the-art performance for an EEG sex classification task, and show that the model features a difference in the theta frequency band. We show that visualization of a CNN model can reveal interesting EEG results. Using these tools, EEG researchers using Deep Learning can better identify the learned EEG features, possibly identifying new class relevant biomarkers.

【14】 Learning to Rectify for Robust Learning with Noisy Labels 标题:学习纠错以实现带噪声标签的鲁棒学习 链接:https://arxiv.org/abs/2111.04239

作者:Haoliang Sun,Chenhui Guo,Qi Wei,Zhongyi Han,Yilong Yin 机构:School of Software, Shandong University, Jinan, China 摘要:在实际应用中,标签噪声显著降低了深层模型的泛化能力。在训练神经网络时,设计了有效的策略和方法,例如重新加权或损失校正,以减轻标签噪声的负面影响。这些现有的工作通常依赖于预先指定的体系结构和手动调整额外的超参数。在本文中,我们提出了翘曲概率推理(WarPI)来实现元学习场景中分类网络的自适应校正训练过程。与确定性模型相比,WarPI通过学习一个摊销元网络被表示为一个分层概率模型,它可以解决样本模糊性,因此对严重的标签噪声更具鲁棒性。与现有的直接从损失中生成权重值的近似加权函数不同,我们的元网络学习从logit和label的输入估计校正向量,这具有充分利用其中存在的信息的能力。这为纠正分类网络的学习过程提供了一种有效的方法,证明了泛化能力的显著提高。此外,将校正向量建模为潜变量和学习元网络可以无缝地集成到分类网络的SGD优化中。我们在四个有噪声标签的鲁棒学习基准上评估了WarPI,并在各种噪声类型下实现了最新的水平。大量的研究和分析也证明了我们模型的有效性。 摘要:Label noise significantly degrades the generalization ability of deep models in applications. Effective strategies and approaches, \textit{e.g.} re-weighting, or loss correction, are designed to alleviate the negative impact of label noise when training a neural network. Those existing works usually rely on the pre-specified architecture and manually tuning the additional hyper-parameters. In this paper, we propose warped probabilistic inference (WarPI) to achieve adaptively rectifying the training procedure for the classification network within the meta-learning scenario. In contrast to the deterministic models, WarPI is formulated as a hierarchical probabilistic model by learning an amortization meta-network, which can resolve sample ambiguity and be therefore more robust to serious label noise. Unlike the existing approximated weighting function of directly generating weight values from losses, our meta-network is learned to estimate a rectifying vector from the input of the logits and labels, which has the capability of leveraging sufficient information lying in them. This provides an effective way to rectify the learning procedure for the classification network, demonstrating a significant improvement of the generalization ability. Besides, modeling the rectifying vector as a latent variable and learning the meta-network can be seamlessly integrated into the SGD optimization of the classification network. We evaluate WarPI on four benchmarks of robust learning with noisy labels and achieve the new state-of-the-art under variant noise types. Extensive study and analysis also demonstrate the effectiveness of our model.

【15】 Plumber: Diagnosing and Removing Performance Bottlenecks in Machine Learning Data Pipelines 标题:水管工:诊断和消除机器学习数据管道中的性能瓶颈 链接:https://arxiv.org/abs/2111.04131

作者:Michael Kuchnik,Ana Klimovic,Jiri Simsa,George Amvrosiadis,Virginia Smith 机构: the input datapipeline repeatedly produces batches of data with a delay ofat least 1ms after the acceleratormodel is able to consumeWork started while at Google 1Carnegie Mellon University 2ETH Z¨urich 3Google 摘要:输入管道接收和转换输入数据,是训练机器学习(ML)模型的重要组成部分。然而,实现高效的输入管道是一个挑战,因为它需要对细粒度分析信息中的并行性、异步性和可变性进行推理。我们对谷歌数据中心超过200万个ML工作的分析表明,模型训练工作的很大一部分可以从更快的输入数据管道中获益。同时,我们的分析表明,大多数作业不会使主机硬件饱和,这就指向了基于软件的瓶颈。受这些发现的启发,我们提出了Plumber,这是一种在ML输入管道中查找瓶颈的工具。Plumber使用可扩展且可解释的操作分析模型在主机资源约束下自动调整并行性、预取和缓存。在五条具有代表性的ML管道中,管道工对配置错误的管道获得高达46倍的加速比。通过自动缓存,与最先进的调谐器相比,管道工可以获得超过40%的端到端加速。 摘要:Input pipelines, which ingest and transform input data, are an essential part of training Machine Learning (ML) models. However, it is challenging to implement efficient input pipelines, as it requires reasoning about parallelism, asynchrony, and variability in fine-grained profiling information. Our analysis of over 2 million ML jobs in Google datacenters reveals that a significant fraction of model training jobs could benefit from faster input data pipelines. At the same time, our analysis reveals that most jobs do not saturate host hardware, pointing in the direction of software-based bottlenecks. Motivated by these findings, we propose Plumber, a tool for finding bottlenecks in ML input pipelines. Plumber uses an extensible and interprettable operational analysis analytical model to automatically tune parallelism, prefetching, and caching under host resource constraints. Across five representative ML pipelines, Plumber obtains speedups of up to 46x for misconfigured pipelines. By automating caching, Plumber obtains end-to-end speedups of over 40% compared to state-of-the-art tuners.

【16】 NeurInt : Learning to Interpolate through Neural ODEs 标题:NeurInt:通过神经微调学习插值 链接:https://arxiv.org/abs/2111.04123

作者:Avinandan Bose,Aniket Das,Yatin Dandi,Piyush Rai 机构:Indian Institute of Technology Kanpur 备注:Accepted (Spotlight paper) at the NeurIPS 2021 Workshop on the Symbiosis of Deep Learning and Differential Equations (DLDE) 摘要:广泛的应用需要学习图像生成模型,其潜在空间有效地捕获数据分布中存在的高级变化因素。模型通过其潜在空间表示这种变化的程度可以通过其在图像之间平滑插值的能力来判断。然而,大多数生成模型在生成图像之前映射一个固定值,导致插值轨迹缺乏平滑度,并且包含质量降低的图像。在这项工作中,我们提出了一种新的生成模型,该模型在插值轨迹上学习灵活的非参数先验知识,条件是一对源图像和目标图像。我们设计了一个框架,使用潜在的二阶神经常微分方程学习两幅给定图像之间的轨迹分布,而不是依赖于确定性插值方法(如潜在空间中的线性或球形插值)。通过重建和对抗损失的混合组合,生成器被训练成将这些轨迹中的采样点映射到真实图像序列,这些图像序列从源图像平滑过渡到目标图像。通过全面的定性和定量实验,我们证明了我们的方法在生成质量更高的图像方面的有效性,以及它能够学习任何一对真实源图像和目标图像在平滑插值轨迹上的不同分布。 摘要:A wide range of applications require learning image generation models whose latent space effectively captures the high-level factors of variation present in the data distribution. The extent to which a model represents such variations through its latent space can be judged by its ability to interpolate between images smoothly. However, most generative models mapping a fixed prior to the generated images lead to interpolation trajectories lacking smoothness and containing images of reduced quality. In this work, we propose a novel generative model that learns a flexible non-parametric prior over interpolation trajectories, conditioned on a pair of source and target images. Instead of relying on deterministic interpolation methods (such as linear or spherical interpolation in latent space), we devise a framework that learns a distribution of trajectories between two given images using Latent Second-Order Neural Ordinary Differential Equations. Through a hybrid combination of reconstruction and adversarial losses, the generator is trained to map the sampled points from these trajectories to sequences of realistic images that smoothly transition from the source to the target image. Through comprehensive qualitative and quantitative experiments, we demonstrate our approach's effectiveness in generating images of improved quality as well as its ability to learn a diverse distribution over smooth interpolation trajectories for any pair of real source and target images.

【17】 Developing neural machine translation models for Hungarian-English 标题:匈牙利语-英语神经机器翻译模型的开发 链接:https://arxiv.org/abs/2111.04099

作者:Attila Nagy 机构:Judit Ács 摘要:我用匈牙利语语料库训练匈牙利语和匈牙利英语的神经机器翻译任务模型。这项工作的主要贡献是在NMT模型训练期间评估不同的数据增强方法。我提出了5种不同的结构感知的增广方法,这意味着不用随机选择单词进行消隐或替换,而是使用句子的依赖树作为增广的基础。本文从神经网络、序贯建模、神经机器翻译、依存句法分析和数据扩充等方面对我的论文进行了详细的综述。在对Hunglish2语料库进行了详细的探索性数据分析和预处理之后,我使用所提出的数据增强技术进行了实验。匈牙利英语最佳模式的BLEU分数为33.9,而匈牙利英语最佳模式的BLEU分数为28.6。 摘要:I train models for the task of neural machine translation for English-Hungarian and Hungarian-English, using the Hunglish2 corpus. The main contribution of this work is evaluating different data augmentation methods during the training of NMT models. I propose 5 different augmentation methods that are structure-aware, meaning that instead of randomly selecting words for blanking or replacement, the dependency tree of sentences is used as a basis for augmentation. I start my thesis with a detailed literature review on neural networks, sequential modeling, neural machine translation, dependency parsing and data augmentation. After a detailed exploratory data analysis and preprocessing of the Hunglish2 corpus, I perform experiments with the proposed data augmentation techniques. The best model for Hungarian-English achieves a BLEU score of 33.9, while the best model for English-Hungarian achieves a BLEU score of 28.6.

【18】 V-MAO: Generative Modeling for Multi-Arm Manipulation of Articulated Objects 标题:V-MAO:关节物体多臂操作的产生式建模 链接:https://arxiv.org/abs/2111.03987

作者:Xingyu Liu,Kris M. Kitani 机构:Robotics Institute, Carnegie Mellon University 备注:CoRL 2021 摘要:操纵铰接对象通常需要多个机器人手臂。让多个机器人手臂协同完成关节对象上的操纵任务是一项挑战。在本文中,我们提出了$\textbf{V-MAO}$,一个学习关节对象多臂操作的框架。我们的框架包括一个变分生成模型,学习每个机器人手臂在物体刚性部分上的接触点分布。训练信号是通过与仿真环境的交互获得的,仿真环境通过规划和关节对象的以对象为中心的控制的新形式实现。我们在定制的MuJoCo仿真环境中部署了我们的框架,并证明我们的框架在六个不同的对象和两个不同的机器人上实现了较高的成功率。我们还表明,生成建模可以有效地学习关节对象上的接触点分布。 摘要:Manipulating articulated objects requires multiple robot arms in general. It is challenging to enable multiple robot arms to collaboratively complete manipulation tasks on articulated objects. In this paper, we present $\textbf{V-MAO}$, a framework for learning multi-arm manipulation of articulated objects. Our framework includes a variational generative model that learns contact point distribution over object rigid parts for each robot arm. The training signal is obtained from interaction with the simulation environment which is enabled by planning and a novel formulation of object-centric control for articulated objects. We deploy our framework in a customized MuJoCo simulation environment and demonstrate that our framework achieves a high success rate on six different objects and two different robots. We also show that generative modeling can effectively learn the contact point distribution on articulated objects.

【19】 Machine Learning-Assisted E-jet Printing of Organic Flexible Biosensors 标题:基于机器学习的有机柔性生物传感器电喷打印 链接:https://arxiv.org/abs/2111.03985

作者:Mehran Abbasi Shirsavar,Mehrnoosh Taghavimehr,Lionel J. Ouedraogo,Mojan Javaheripi,Nicole N. Hashemi,Farinaz Koushanfar,Reza Montazami 机构:a Department of Mechanical Engineering, Iowa State University, Ames, IA , USA, b Department of Electrical and Computer Engineering, University of California, San Diego, CA , USA, c Department of Mechanical Engineering, Stanford University, Stanford, CA , USA 摘要:电流体动力喷射(e-jet)打印技术可实现复杂软电子设备的高分辨率打印。因此,它在成为打印软电子设备的常规技术方面具有无与伦比的潜力。在本研究中,研究了电子喷射印刷电路的导电性与关键印刷参数(喷嘴速度、油墨流速和电压)的关系。收集的实验数据集然后用于训练机器学习算法,以建立能够实时预测印刷电路特性的模型。比较精度参数以评估监督分类模型。由于决策树方法不能提高超过71%的准确率,因此在我们的数据集上执行了更先进的算法来提高模型的精度。根据F-测量值,K-NN模型(K=10)和随机森林是分类电极导电性的最佳方法。AdaBoost集成学习的最高精确度达到了10-15棵树(87%)。 摘要:Electrohydrodynamic-jet (e-jet) printing technique enables the high-resolution printing of complex soft electronic devices. As such, it has an unmatched potential for becoming the conventional technique for printing soft electronic devices. In this study, the electrical conductivity of the e-jet printed circuits was studied as a function of key printing parameters (nozzle speed, ink flow rate, and voltage). The collected experimental dataset was then used to train a machine learning algorithm to establish models capable of predicting the characteristics of the printed circuits in real-time. Precision parameters were compared to evaluate the supervised classification models. Since decision tree methods could not increase the accuracy higher than 71%, more advanced algorithms are performed on our dataset to improve the precision of model. According to F-measure values, the K-NN model (k=10) and random forest are the best methods to classify the conductivity of electrodes. The highest accuracy of AdaBoost ensemble learning has resulted in the range of 10-15 trees (87%).

【20】 A Probit Tensor Factorization Model For Relational Learning 标题:一种用于关系学习的Probit张量分解模型 链接:https://arxiv.org/abs/2111.03943

作者:Ye Liu,Rui Song,Wenbin Lu 机构:Department of Statistics, North Carolina State University, and 备注:30 pages 摘要:随着知识图的激增,具有复杂多关系结构的数据建模在统计关系学习领域受到越来越多的关注。统计关系学习最重要的目标之一是链接预测,即预测知识图中是否存在某些关系。大量的模型和算法被提出来进行链路预测,其中张量因子分解方法在计算效率和预测精度方面达到了最先进的水平。然而,现有张量因子分解模型的一个共同缺点是,缺失关系和不存在关系被以相同的方式处理,这导致信息丢失。为了解决这个问题,我们提出了一个带有probit链接的二元张量分解模型,它不仅继承了经典张量分解模型的计算效率,而且还考虑了关系数据的二元性。我们提出的probit张量因子分解(PTF)模型在预测精度和可解释性方面都显示出优势 摘要:With the proliferation of knowledge graphs, modeling data with complex multirelational structure has gained increasing attention in the area of statistical relational learning. One of the most important goals of statistical relational learning is link prediction, i.e., predicting whether certain relations exist in the knowledge graph. A large number of models and algorithms have been proposed to perform link prediction, among which tensor factorization method has proven to achieve state-of-the-art performance in terms of computation efficiency and prediction accuracy. However, a common drawback of the existing tensor factorization models is that the missing relations and non-existing relations are treated in the same way, which results in a loss of information. To address this issue, we propose a binary tensor factorization model with probit link, which not only inherits the computation efficiency from the classic tensor factorization model but also accounts for the binary nature of relational data. Our proposed probit tensor factorization (PTF) model shows advantages in both the prediction accuracy and interpretability

【21】 Physics-Informed Neural Operator for Learning Partial Differential Equations 标题:学习偏微分方程的物理信息神经算子 链接:https://arxiv.org/abs/2111.03794

作者:Zongyi Li,Hongkai Zheng,Nikola Kovachki,David Jin,Haoxuan Chen,Burigede Liu,Kamyar Azizzadenesheli,Anima Anandkumar 摘要:机器学习方法最近在求解偏微分方程(PDE)方面显示出了良好的前景。它们可以分为两大类:近似解函数和学习解算子。物理信息神经网络(PINN)是前者的一个例子,而傅立叶神经算子(FNO)是后者的一个例子。这两种方法都有缺点。PINN中的优化具有挑战性且容易失败,尤其是在多尺度动态系统中。FNO不会受到这种优化问题的影响,因为它在给定数据集上执行监督学习,但获取此类数据可能过于昂贵或不可行。在这项工作中,我们提出了物理信息神经算子(PINO),其中我们结合了操作学习和函数优化框架。这种综合方法提高了PINN和FNO模型的收敛速度和精度。在运算符学习阶段,PINO通过参数化PDE族的多个实例学习解决方案运算符。在测试时间优化阶段,PINO为PDE的查询实例优化预先训练的操作符ansatz。实验表明,与解算器相比,PINO在许多流行的PDE族上优于以前的ML方法,同时保持了FNO的非凡速度。特别是,PINO精确地解决了具有挑战性的长时间瞬态流和Kolmogorov流,其中其他基线ML方法无法收敛。 摘要:Machine learning methods have recently shown promise in solving partial differential equations (PDEs). They can be classified into two broad categories: approximating the solution function and learning the solution operator. The Physics-Informed Neural Network (PINN) is an example of the former while the Fourier neural operator (FNO) is an example of the latter. Both these approaches have shortcomings. The optimization in PINN is challenging and prone to failure, especially on multi-scale dynamic systems. FNO does not suffer from this optimization issue since it carries out supervised learning on a given dataset, but obtaining such data may be too expensive or infeasible. In this work, we propose the physics-informed neural operator (PINO), where we combine the operating-learning and function-optimization frameworks. This integrated approach improves convergence rates and accuracy over both PINN and FNO models. In the operator-learning phase, PINO learns the solution operator over multiple instances of the parametric PDE family. In the test-time optimization phase, PINO optimizes the pre-trained operator ansatz for the querying instance of the PDE. Experiments show PINO outperforms previous ML methods on many popular PDE families while retaining the extraordinary speed-up of FNO compared to solvers. In particular, PINO accurately solves challenging long temporal transient flows and Kolmogorov flows where other baseline ML methods fail to converge.

【22】 MQBench: Towards Reproducible and Deployable Model Quantization Benchmark 标题:MQBench:迈向可重现、可部署的模型量化基准 链接:https://arxiv.org/abs/2111.03759

作者:Yuhang Li,Mingzhu Shen,Jian Ma,Yan Ren,Mingxin Zhao,Qi Zhang,Ruihao Gong,Fengwei Yu,Junjie Yan 备注:Accepted by 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks 摘要:模型量化已成为加速深度学习推理不可缺少的技术。虽然研究人员继续推动量化算法的前沿,但现有的量化工作往往无法生产和部署。这是因为研究人员没有选择一致的训练管道,并且忽略了硬件部署的要求。在这项工作中,我们提出了模型量化基准(MQBench),这是第一次尝试评估、分析和基准化模型量化算法的再现性和可部署性。我们为实际部署选择了多个不同的平台,包括CPU、GPU、ASIC、DSP,并在统一的训练管道下评估大量最先进的量化算法。MQBench就像一座桥梁,连接算法和硬件。我们进行了全面的分析,发现了相当多的直觉或反直觉的见解。通过调整训练设置,我们发现现有算法在传统学术轨道上的性能大致相同。而对于硬件可部署量化,存在着巨大的精度差距,这一差距尚未解决。令人惊讶的是,在MQBench中,没有一种现有的算法能够赢得所有的挑战,我们希望这项工作能够启发未来的研究方向。 摘要:Model quantization has emerged as an indispensable technique to accelerate deep learning inference. While researchers continue to push the frontier of quantization algorithms, existing quantization work is often unreproducible and undeployable. This is because researchers do not choose consistent training pipelines and ignore the requirements for hardware deployments. In this work, we propose Model Quantization Benchmark (MQBench), a first attempt to evaluate, analyze, and benchmark the reproducibility and deployability for model quantization algorithms. We choose multiple different platforms for real-world deployments, including CPU, GPU, ASIC, DSP, and evaluate extensive state-of-the-art quantization algorithms under a unified training pipeline. MQBench acts like a bridge to connect the algorithm and the hardware. We conduct a comprehensive analysis and find considerable intuitive or counter-intuitive insights. By aligning the training settings, we find existing algorithms have about the same performance on the conventional academic track. While for the hardware-deployable quantization, there is a huge accuracy gap which remains unsettled. Surprisingly, no existing algorithm wins every challenge in MQBench, and we hope this work could inspire future research directions.

【23】 Toward Learning Human-aligned Cross-domain Robust Models by Countering Misaligned Features 标题:通过对抗未对齐特征来学习人对齐的跨域鲁棒模型 链接:https://arxiv.org/abs/2111.03740

作者:Haohan Wang,Zeyi Huang,Hanlin Zhang,Eric Xing 机构:School of Computer Science, Carnegie Mellon University 备注:10 pages of main contents 摘要:机器学习已经证明了对i.i.d数据的显著预测精度,但当使用来自另一个分布的数据进行测试时,精度通常会下降。在本文中,我们的目的是从另一个角度来看待这个问题,假设精度下降的原因是模型依赖的特征与数据注释者认为这两个数据集相似的方式不一致。我们将这些特征称为未对齐特征。我们通过了解未对齐特征与标签的关联方式,将传统的泛化误差界扩展到新的泛化误差界。我们的分析为这个问题提供了一套技术,这些技术自然与健壮机器学习文献中的许多以前的方法相联系。我们还比较了这些方法的经验强度,证明了将这些以前的技术结合在一起时的性能。 摘要:Machine learning has demonstrated remarkable prediction accuracy over i.i.d data, but the accuracy often drops when tested with data from another distribution. In this paper, we aim to offer another view of this problem in a perspective assuming the reason behind this accuracy drop is the reliance of models on the features that are not aligned well with how a data annotator considers similar across these two datasets. We refer to these features as misaligned features. We extend the conventional generalization error bound to a new one for this setup with the knowledge of how the misaligned features are associated with the label. Our analysis offers a set of techniques for this problem, and these techniques are naturally linked to many previous methods in robust machine learning literature. We also compared the empirical strength of these methods demonstrated the performance when these previous techniques are combined.

【24】 Frugal Machine Learning 标题:节约型机器学习 链接:https://arxiv.org/abs/2111.03731

作者:Mikhail Evchenko,Joaquin Vanschoren,Holger H. Hoos,Marc Schoenauer,Michèle Sebag 机构:Michele Sebag 摘要:机器学习已经成为越来越多系统和应用的核心,随着可穿戴设备和物联网的迅速崛起,机器学习将变得更加普遍。在大多数机器学习应用中,主要关注的是所获得结果的质量(例如,预测精度),因此收集了大量数据,需要大量计算资源来构建模型。然而,在许多情况下,建立大型集中式数据存储库是不可行或不切实际的。例如,在个人健康方面,隐私问题可能会阻碍详细个人数据的共享。在这种情况下,机器学习最好在可穿戴设备本身上进行,这会增加主要的计算限制,例如智能手表的电池容量。因此,本文研究节俭学习,旨在用最少的资源建立尽可能精确的模型。通过一个节俭的视角考察了各种学习算法,分析了它们在各种数据集上的准确性/运行时性能。然后,通过在smartwatch中实现最有前途的算法,并让它们在手表上学习活动识别模型,在真实场景中对这些算法进行评估。 摘要:Machine learning, already at the core of increasingly many systems and applications, is set to become even more ubiquitous with the rapid rise of wearable devices and the Internet of Things. In most machine learning applications, the main focus is on the quality of the results achieved (e.g., prediction accuracy), and hence vast amounts of data are being collected, requiring significant computational resources to build models. In many scenarios, however, it is infeasible or impractical to set up large centralized data repositories. In personal health, for instance, privacy issues may inhibit the sharing of detailed personal data. In such cases, machine learning should ideally be performed on wearable devices themselves, which raises major computational limitations such as the battery capacity of smartwatches. This paper thus investigates frugal learning, aimed to build the most accurate possible models using the least amount of resources. A wide range of learning algorithms is examined through a frugal lens, analyzing their accuracy/runtime performance on a wide range of data sets. The most promising algorithms are thereafter assessed in a real-world scenario by implementing them in a smartwatch and letting them learn activity recognition models on the watch itself.

【25】 Reconstructing Training Data from Diverse ML Models by Ensemble Inversion 标题:基于集成反演的不同ML模型训练数据重构 链接:https://arxiv.org/abs/2111.03702

作者:Qian Wang,Daniel Kurz 备注:9 pages, 8 figures, WACV 2022 摘要:模型反演(MI)是指对手滥用对经过训练的机器学习(ML)模型的访问权,试图推断其原始训练数据的敏感信息,引起了越来越多的研究关注。在MI期间,受攻击训练模型(MUA)通常被冻结并用于指导生成器(如生成性对抗网络(GAN))的训练,以重建该模型原始训练数据的分布。这可能会导致原始训练样本泄漏,如果成功,如果训练数据包含个人识别信息(PII),则数据集受试者的隐私将受到威胁。因此,深入研究MI技术的潜力对于相应防御技术的发展至关重要。基于单一模型的高质量训练数据重建具有挑战性。然而,现有的MI文献并未探讨联合瞄准多个模型,这可能会为对手提供额外的信息和不同的视角。我们提出了集合反演技术,该技术通过训练受集合(或集合)约束的生成器来估计原始训练数据的分布。与单个ML模型的MI相比,使用数据集实体的可区分特征生成的样本的质量显著提高。我们在没有任何数据集的情况下获得了高质量的结果,并展示了如何利用与假定训练数据相似的辅助数据集来改进结果。深入研究了集合中模型多样性的影响,并利用附加约束来鼓励对重建样本进行精确预测和高激活,从而更准确地重建训练图像。 摘要:Model Inversion (MI), in which an adversary abuses access to a trained Machine Learning (ML) model attempting to infer sensitive information about its original training data, has attracted increasing research attention. During MI, the trained model under attack (MUA) is usually frozen and used to guide the training of a generator, such as a Generative Adversarial Network (GAN), to reconstruct the distribution of the original training data of that model. This might cause leakage of original training samples, and if successful, the privacy of dataset subjects will be at risk if the training data contains Personally Identifiable Information (PII). Therefore, an in-depth investigation of the potentials of MI techniques is crucial for the development of corresponding defense techniques. High-quality reconstruction of training data based on a single model is challenging. However, existing MI literature does not explore targeting multiple models jointly, which may provide additional information and diverse perspectives to the adversary. We propose the ensemble inversion technique that estimates the distribution of original training data by training a generator constrained by an ensemble (or set) of trained models with shared subjects or entities. This technique leads to noticeable improvements of the quality of the generated samples with distinguishable features of the dataset entities compared to MI of a single ML model. We achieve high quality results without any dataset and show how utilizing an auxiliary dataset that's similar to the presumed training data improves the results. The impact of model diversity in the ensemble is thoroughly investigated and additional constraints are utilized to encourage sharp predictions and high activations for the reconstructed samples, leading to more accurate reconstruction of training images.

【26】 Learning Filterbanks for End-to-End Acoustic Beamforming 标题:用于端到端声学波束形成的学习滤波器组 链接:https://arxiv.org/abs/2111.04614

作者:Samuele Cornell,Manuel Pariente,François Grondin,Stefano Squartini 机构:Italy, Universit´e de Lorraine, CNRS, Inria, LORIA, France, Universit´e de Sherbrooke, Canada 摘要:最近关于单声道声源分离的研究表明,使用具有短窗口的完全学习滤波器组可以提高性能。另一方面,众所周知,对于传统波束形成技术,性能随着长分析窗口而提高。这也适用于大多数依赖深度神经网络(DNN)来估计空间协方差矩阵的混合神经波束形成方法。在这项工作中,我们试图弥合这两个世界之间的差距,并探索完全端到端的混合神经波束形成,在这种波束形成中,我们不使用短时傅里叶变换,而是与DNN联合学习分析和合成滤波器组。详细地说,我们探讨了两种不同类型的习得滤波器组:完全习得滤波器组和分析滤波器组。我们使用最近的Clarity Challenge数据进行了详细分析,结果表明,对于短窗口,使用学习过的滤波器组有可能超过基于oracle mask的波束形成。 摘要:Recent work on monaural source separation has shown that performance can be increased by using fully learned filterbanks with short windows. On the other hand it is widely known that, for conventional beamforming techniques, performance increases with long analysis windows. This applies also to most hybrid neural beamforming methods which rely on a deep neural network (DNN) to estimate the spatial covariance matrices. In this work we try to bridge the gap between these two worlds and explore fully end-to-end hybrid neural beamforming in which, instead of using the Short-Time-Fourier Transform, also the analysis and synthesis filterbanks are learnt jointly with the DNN. In detail, we explore two different types of learned filterbanks: fully learned and analytic. We perform a detailed analysis using the recent Clarity Challenge data and show that by using learnt filterbanks is possible to surpass oracle-mask based beamforming for short windows.

【27】 Lattice gauge symmetry in neural networks 标题:神经网络中的格点规范对称性 链接:https://arxiv.org/abs/2111.04389

作者:Matteo Favoni,Andreas Ipp,David I. Müller,Daniel Schuh 机构:Institute for Theoretical Physics, TU Wien, Wiedner Hauptstrasse ,-,, Tower B, Wien, Austria 备注:10 pages, 3 figures, proceedings for the 38th International Symposium on Lattice Field Theory (LATTICE21) 摘要:我们回顾了一种新的神经网络结构,称为格点规范等变卷积神经网络(L-CNN),它可以应用于格点规范理论中的一般机器学习问题,同时精确地保持规范对称性。我们讨论了规范等变的概念,我们用它显式地构造了规范等变卷积层和双线性层。使用看似简单的非线性回归任务对L-CNN和非等变CNN的性能进行比较,其中L-CNN与非等变CNN相比具有普遍性,并在预测中实现了高度的准确性。 摘要:We review a novel neural network architecture called lattice gauge equivariant convolutional neural networks (L-CNNs), which can be applied to generic machine learning problems in lattice gauge theory while exactly preserving gauge symmetry. We discuss the concept of gauge equivariance which we use to explicitly construct a gauge equivariant convolutional layer and a bilinear layer. The performance of L-CNNs and non-equivariant CNNs is compared using seemingly simple non-linear regression tasks, where L-CNNs demonstrate generalizability and achieve a high degree of accuracy in their predictions compared to their non-equivariant counterparts.

【28】 Learning equilibria with personalized incentives in a class of nonmonotone games 标题:一类非单调博弈中具有个性化激励的学习均衡 链接:https://arxiv.org/abs/2111.03854

作者:Filippo Fabiani,Andrea Simonetto,Paul J. Goulart 机构: Goulart are with the Department of Engi-neering Science, University of Oxford 摘要:我们考虑二次,非单调广义纳什均衡问题与代理之间的对称相互作用,这是已知的潜力。正如实际情况中可能发生的那样,我们设想了一个潜在势函数的显式表达式不可用的场景,并设计了一个两层纳什均衡搜索算法。在该方案中,协调器迭代地整合噪声代理的反馈,学习代理的伪梯度,然后为代理设计个性化的激励。在他们这方面,代理接受这些个性化的激励,计算一个扩展游戏的解决方案,然后将反馈度量返回给协调员。我们证明了在协调器被赋予标准学习策略的情况下,我们的算法返回一个平衡点,并在一个次单调博弈的数值例子上验证了我们的结果。 摘要:We consider quadratic, nonmonotone generalized Nash equilibrium problems with symmetric interactions among the agents, which are known to be potential. As may happen in practical cases, we envision a scenario in which an explicit expression of the underlying potential function is not available, and we design a two-layer Nash equilibrium seeking algorithm. In the proposed scheme, a coordinator iteratively integrates the noisy agents' feedback to learn the pseudo-gradients of the agents, and then design personalized incentives for them. On their side, the agents receive those personalized incentives, compute a solution to an extended game, and then return feedback measures to the coordinator. We show that our algorithm returns an equilibrium in case the coordinator is endowed with standard learning policies, and corroborate our results on a numerical instance of a hypomonotone game.

【29】 Tradeoffs of Linear Mixed Models in Genome-wide Association Studies 标题:全基因组关联研究中线性混合模型的权衡 链接:https://arxiv.org/abs/2111.03739

作者:Haohan Wang,Bryon Aragam,Eric Xing 机构:School of Computer Science, Carnegie Mellon University, Booth School of Business, University of Chicago 备注:in final revision of Journal of Computational Biology 摘要:基于全基因组关联研究(GWAS)文献中众所周知的经验论点,我们研究了应用于GWAS的线性混合模型(LMM)的统计特性。首先,我们研究了LMMs对在亲属关系矩阵中包含候选SNP的敏感性,这在实践中经常被用来加速计算。我们的结果揭示了包含候选SNP所产生的误差的大小,为该技术提供了一个理由,以权衡速度和准确性。其次,我们研究了混合模型如何纠正GWAS中的混杂因素,这被广泛认为是LMMs优于传统方法的优势。我们考虑了混杂因素、人口分层和环境混杂因素的两个来源,并研究了在实践中常用的不同方法如何权衡这两个混杂因素。 摘要:Motivated by empirical arguments that are well-known from the genome-wide association studies (GWAS) literature, we study the statistical properties of linear mixed models (LMMs) applied to GWAS. First, we study the sensitivity of LMMs to the inclusion of a candidate SNP in the kinship matrix, which is often done in practice to speed up computations. Our results shed light on the size of the error incurred by including a candidate SNP, providing a justification to this technique in order to trade-off velocity against veracity. Second, we investigate how mixed models can correct confounders in GWAS, which is widely accepted as an advantage of LMMs over traditional methods. We consider two sources of confounding factors, population stratification and environmental confounding factors, and study how different methods that are commonly used in practice trade-off these two confounding factors differently.

【30】 Explaining neural network predictions of material strength 标题:解释材料强度的神经网络预测 链接:https://arxiv.org/abs/2111.03729

作者:Ian A. Palmer,T. Nathan Mundhenk,Brian Gallagher,Yong Han 机构:Lawrence Livermore National Laboratory, Livermore, CA, USA, Equal Contribution 摘要:我们最近开发了一种深度学习方法,可以通过查看材料晶体的扫描电子显微镜(SEM)图像来确定材料的临界峰值应力。然而,目前还不清楚网络在进行预测时,会根据什么样的图像特征进行键入。在计算机视觉中,通常使用可解释的AI显著性图来告诉人们图像的哪些部分对网络决策很重要。人们通常可以通过观察这些显著的位置来推断出重要的特征。然而,对于人类观察者来说,晶体的SEM图像比自然图像照片更为抽象。因此,在最突出的位置很难区分哪些特征是重要的。为了解决这个问题,我们开发了一种方法,帮助我们将SEM图像中重要位置的特征映射到更容易解释的非抽象纹理。 摘要:We recently developed a deep learning method that can determine the critical peak stress of a material by looking at scanning electron microscope (SEM) images of the material's crystals. However, it has been somewhat unclear what kind of image features the network is keying off of when it makes its prediction. It is common in computer vision to employ an explainable AI saliency map to tell one what parts of an image are important to the network's decision. One can usually deduce the important features by looking at these salient locations. However, SEM images of crystals are more abstract to the human observer than natural image photographs. As a result, it is not easy to tell what features are important at the locations which are most salient. To solve this, we developed a method that helps us map features from important locations in SEM images to non-abstract textures that are easier to interpret.

其他(32篇)

【1】 Bayesian Framework for Gradient Leakage 标题:梯度泄漏的贝叶斯框架 链接:https://arxiv.org/abs/2111.04706

作者:Mislav Balunović,Dimitar I. Dimitrov,Robin Staab,Martin Vechev 机构:Department of Computer Science, ETH Zurich 摘要:联邦学习是一种在不共享训练数据的情况下训练机器学习模型的方法。然而,最近的工作表明,它不能保证数据隐私,因为共享梯度仍然可能泄漏敏感信息。为了将梯度泄漏问题形式化,我们提出了一个理论框架,首次将贝叶斯最优对手分析为一个优化问题。我们证明了现有的泄漏攻击可以被看作是这个最优对手的近似,并且对输入数据和梯度的概率分布有不同的假设。我们的实验证实了Bayes最优对手在了解潜在分布时的有效性。此外,我们的实验评估表明,现有的几种启发式防御方法无法有效抵御更强的攻击,尤其是在训练过程的早期。因此,我们的研究结果表明,构建更有效的防御体系及其评估仍然是一个悬而未决的问题。 摘要:Federated learning is an established method for training machine learning models without sharing training data. However, recent work has shown that it cannot guarantee data privacy as shared gradients can still leak sensitive information. To formalize the problem of gradient leakage, we propose a theoretical framework that enables, for the first time, analysis of the Bayes optimal adversary phrased as an optimization problem. We demonstrate that existing leakage attacks can be seen as approximations of this optimal adversary with different assumptions on the probability distributions of the input data and gradients. Our experiments confirm the effectiveness of the Bayes optimal adversary when it has knowledge of the underlying distribution. Further, our experimental evaluation shows that several existing heuristic defenses are not effective against stronger attacks, especially early in the training process. Thus, our findings indicate that the construction of more effective defenses and their evaluation remains an open problem.

【2】 Revisiting Methods for Finding Influential Examples 标题:重新审视寻找有影响力的例子的方法 链接:https://arxiv.org/abs/2111.04683

作者:Karthikeyan K,Anders Søgaard 机构: Duke University, University of Copenhagen 摘要:最近提出了几种基于实例的可解释性方法,用于寻找对测试时间决策有影响的训练示例,包括影响函数、TraceIn、Representer点选择、Grad-Dot和Grad-Cos。通常,这些方法使用LOO影响(库克距离)作为金标准进行评估,或使用各种启发式方法进行评估。在本文中,我们证明了上述所有方法都是不稳定的,即对初始化、训练数据的排序和批量大小非常敏感。我们认为,这是文献中假设示例的影响独立于模型状态和其他示例的自然结果,并认为事实并非如此。因此,我们证明了LOO影响和启发式是衡量基于实例的解释质量的糟糕指标,相反,我们建议通过它们检测中毒攻击的能力来评估这些解释。此外,我们提供了一个简单而有效的基线来改进上述所有方法,并展示了它如何导致下游任务的非常显著的改进。 摘要:Several instance-based explainability methods for finding influential training examples for test-time decisions have been proposed recently, including Influence Functions, TraceIn, Representer Point Selection, Grad-Dot, and Grad-Cos. Typically these methods are evaluated using LOO influence (Cook's distance) as a gold standard, or using various heuristics. In this paper, we show that all of the above methods are unstable, i.e., extremely sensitive to initialization, ordering of the training data, and batch size. We suggest that this is a natural consequence of how in the literature, the influence of examples is assumed to be independent of model state and other examples -- and argue it is not. We show that LOO influence and heuristics are, as a result, poor metrics to measure the quality of instance-based explanations, and instead propose to evaluate such explanations by their ability to detect poisoning attacks. Further, we provide a simple, yet effective baseline to improve all of the above methods and show how it leads to very significant improvements on downstream tasks.

【3】 Feature Concepts for Data Federative Innovations 标题:数据联合创新的特征概念 链接:https://arxiv.org/abs/2111.04505

作者:Yukio Ohsawa,Sae Kondo,Teruaki Hayashi 机构:: Dept. Systems Innovation, School of Engineering, The University of Tokyo, : Dept. Architecture, Graduate School of Engineering, Mie University 备注:13 pages, 7 figures 摘要:特征概念是数据联邦创新过程的本质,是从数据中获取概念的模型。特征概念可以是简单的特征,例如单个变量,但更可能是从数据中获得的抽象信息的概念说明。例如,树和簇分别是决策树学习和聚类的特征概念。到目前为止,通过数据市场利益相关者之间的创造性沟通,已经引出了满足数据用户需求的有用特征概念。在这篇短文中,我们回顾了这种创造性的交流,展示了一些应用,例如,市场和地震中的变化解释,并强调了在这些案例中引出的特征概念。 摘要:A feature concept, the essence of the data-federative innovation process, is presented as a model of the concept to be acquired from data. A feature concept may be a simple feature, such as a single variable, but is more likely to be a conceptual illustration of the abstract information to be obtained from the data. For example, trees and clusters are feature concepts for decision tree learning and clustering, respectively. Useful feature concepts for satis-fying the requirements of users of data have been elicited so far via creative communication among stakeholders in the market of data. In this short paper, such a creative communication is reviewed, showing a couple of appli-cations, for example, change explanation in markets and earthquakes, and highlight the feature concepts elicited in these cases.

【4】 An Approach for Combining Multimodal Fusion and Neural Architecture Search Applied to Knowledge Tracing 标题:一种多模态融合与神经结构搜索相结合的知识追踪方法 链接:https://arxiv.org/abs/2111.04497

作者:Xinyi Ding,Tao Han,Yili Fang,Eric Larson 机构:School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, China, Lyle School of Engineering, Southern Methodist University, Dallas, USA 摘要:知识追踪是追踪学生对某一特定学习领域不同技能掌握程度的过程。它是构建自适应学习系统的关键组件之一,已经研究了几十年。与深度神经网络在其他领域取得成功的同时,我们也看到研究人员在学习科学界采取了类似的方法。然而,大多数现有的基于深度学习的知识跟踪模型要么:(1)只使用正确/错误的响应(忽略其他模式的有用信息),要么(2)通过领域专家的尝试和错误设计其网络架构。在本文中,我们提出了一种基于序列模型的优化方法,将多模态融合和神经结构搜索结合在一个框架内。当只涉及一种模态时,常用的神经结构搜索技术可以被视为我们所提出方法的特例。我们进一步建议使用一种称为曲线下时间加权面积(加权AUC)的新度量来衡量序列模型随时间的表现。我们在两个公共真实数据集上评估了我们的方法,结果表明发现的模型能够实现优异的性能。与大多数现有工作不同,我们对模型预测进行了McNemar检验,结果具有统计意义。 摘要:Knowledge Tracing is the process of tracking mastery level of different skills of students for a given learning domain. It is one of the key components for building adaptive learning systems and has been investigated for decades. In parallel with the success of deep neural networks in other fields, we have seen researchers take similar approaches in the learning science community. However, most existing deep learning based knowledge tracing models either: (1) only use the correct/incorrect response (ignoring useful information from other modalities) or (2) design their network architectures through domain expertise via trial and error. In this paper, we propose a sequential model based optimization approach that combines multimodal fusion and neural architecture search within one framework. The commonly used neural architecture search technique could be considered as a special case of our proposed approach when there is only one modality involved. We further propose to use a new metric called time-weighted Area Under the Curve (weighted AUC) to measure how a sequence model performs with time. We evaluate our methods on two public real datasets showing the discovered model is able to achieve superior performance. Unlike most existing works, we conduct McNemar's test on the model predictions and the results are statistically significant.

【5】 Identifying the Leading Factors of Significant Weight Gains Using a New Rule Discovery Method 标题:用一种新的规则发现方法识别显著增重的主导因素 链接:https://arxiv.org/abs/2111.04475

作者:Mina Samizadeh,Jessica C Jones-Smith,Bethany Sheridan,Rahmatollah Beheshti 机构: University of Delaware, Delaware, USA., University of Washington, Washington, USA., athenahealth, Inc., Massachusetts, USA. 备注:The code for this project is available on: this https URL 摘要:超重和肥胖仍然是一个主要的全球公共卫生问题,确定增加未来体重增加风险的个体化模式对于预防肥胖和许多与肥胖相关的后续疾病具有至关重要的作用。在这项工作中,我们使用一种规则发现方法来研究这个问题,方法是提供真正的可解释性,同时优化识别模式的准确性(通常是正确的)和支持(应用于许多样本)。具体而言,我们扩展了一种已建立的子组发现方法,以生成所需的X->Y类型规则,并展示如何从X侧提取顶部特征,作为Y的最佳预测因子。在我们的肥胖问题中,X指从非常大和多站点EHR数据中提取的特征,Y表示体重显著增加。使用我们的方法,我们还广泛地比较了22个阶层的模式差异和不平等,这22个阶层由个人的性别、年龄、种族、保险类型、社区类型和收入水平决定。通过一系列广泛的实验,我们展示了关于未来危险体重增加预测因子的新的补充发现。 摘要:Overweight and obesity remain a major global public health concern and identifying the individualized patterns that increase the risk of future weight gains has a crucial role in preventing obesity and numerous sub-sequent diseases associated with obesity. In this work, we use a rule discovery method to study this problem, by presenting an approach that offers genuine interpretability and concurrently optimizes the accuracy(being correct often) and support (applying to many samples) of the identified patterns. Specifically, we extend an established subgroup-discovery method to generate the desired rules of type X -> Y and show how top features can be extracted from the X side, functioning as the best predictors of Y. In our obesity problem, X refers to the extracted features from very large and multi-site EHR data, and Y indicates significant weight gains. Using our method, we also extensively compare the differences and inequities in patterns across 22 strata determined by the individual's gender, age, race, insurance type, neighborhood type, and income level. Through extensive series of experiments, we show new and complementary findings regarding the predictors of future dangerous weight gains.

【6】 There is no Double-Descent in Random Forests 标题:随机森林中不存在双下降现象 链接:https://arxiv.org/abs/2111.04409

作者:Sebastian Buschjäger,Katharina Morik 机构:Artificial Intelligence Group, TU Dortmund, Germany 备注:11 pages, 3 figures, 3 algorithms 摘要:随机森林(RFs)是机器学习领域中最先进的技术之一,在几乎零参数调整的情况下提供优异的性能。值得注意的是,RFs似乎不受过度装配的影响,即使它们的基本构造块众所周知是过度装配的。最近,一项广受欢迎的研究认为RF呈现出所谓的双下降曲线:首先,模型以u形曲线过度拟合数据,然后,一旦达到某个模型复杂性,它会突然再次提高性能。在本文中,我们对模型容量是解释射频成功的正确工具这一概念提出质疑,并认为训练模型的算法比以前认为的更重要。我们证明了RF不呈现双下降曲线,而是呈现单下降曲线。因此,它在经典意义上并没有过度拟合。我们进一步提出了一种RF变化,虽然其决策边界近似于过度拟合的DT,但它也不会过度拟合。类似地,我们表明近似于RF决策边界的DT仍然会过拟合。最后,我们研究了集合的多样性,以此作为评估其性能的工具。为此,我们引入了负相关森林(NCForest),它允许对集合中的多样性进行精确控制。我们发现,分集和偏置确实对RF的性能有着至关重要的影响。分集过低会将射频性能压缩为一棵树,而分集过高则意味着大多数树不再产生正确的输出。然而,在这两个极端之间,我们发现了一个大范围的不同折衷方案,所有这些方案的性能大致相同。因此,只要算法达到这种良好的折衷机制,偏见和多样性之间的具体折衷并不重要。 摘要:Random Forests (RFs) are among the state-of-the-art in machine learning and offer excellent performance with nearly zero parameter tuning. Remarkably, RFs seem to be impervious to overfitting even though their basic building blocks are well-known to overfit. Recently, a broadly received study argued that a RF exhibits a so-called double-descent curve: First, the model overfits the data in a u-shaped curve and then, once a certain model complexity is reached, it suddenly improves its performance again. In this paper, we challenge the notion that model capacity is the correct tool to explain the success of RF and argue that the algorithm which trains the model plays a more important role than previously thought. We show that a RF does not exhibit a double-descent curve but rather has a single descent. Hence, it does not overfit in the classic sense. We further present a RF variation that also does not overfit although its decision boundary approximates that of an overfitted DT. Similar, we show that a DT which approximates the decision boundary of a RF will still overfit. Last, we study the diversity of an ensemble as a tool the estimate its performance. To do so, we introduce Negative Correlation Forest (NCForest) which allows for precise control over the diversity in the ensemble. We show, that the diversity and the bias indeed have a crucial impact on the performance of the RF. Having too low diversity collapses the performance of the RF into a a single tree, whereas having too much diversity means that most trees do not produce correct outputs anymore. However, in-between these two extremes we find a large range of different trade-offs with all roughly equal performance. Hence, the specific trade-off between bias and diversity does not matter as long as the algorithm reaches this good trade-off regime.

【7】 Defense Against Explanation Manipulation 标题:对解释操纵的防御 链接:https://arxiv.org/abs/2111.04303

作者:Ruixiang Tang,Ninghao Liu,Fan Yang,Na Zou,Xia Hu 机构:Texas A&M University, Rice University 摘要:可解释机器学习由于提高了模型的透明度而受到越来越多的关注,这有助于机器学习在实际应用中得到信任。然而,解释方法最近被证明是易受操纵的,我们可以很容易地改变模型的解释,同时保持其预测不变。为了解决这个问题,已经付出了一些努力来使用更稳定的解释方法或改变模型配置。在这项工作中,我们从训练的角度解决了这个问题,并提出了一种新的训练方案,称为解释上的对抗性训练(ATEX),以提高模型的内部解释稳定性,而不管采用何种解释方法。ATEX没有直接指定数据实例的解释值,而是只对模型预测提出要求,从而避免在优化过程中涉及二阶导数。作为进一步讨论,我们还发现解释稳定性与模型的另一个属性密切相关,即暴露于对抗性攻击的风险。通过实验,除了表明ATEX提高了模型对操纵目标解释的鲁棒性外,它还带来了其他好处,包括平滑解释和提高对抗性训练的效果(如果应用于模型)。 摘要:Explainable machine learning attracts increasing attention as it improves transparency of models, which is helpful for machine learning to be trusted in real applications. However, explanation methods have recently been demonstrated to be vulnerable to manipulation, where we can easily change a model's explanation while keeping its prediction constant. To tackle this problem, some efforts have been paid to use more stable explanation methods or to change model configurations. In this work, we tackle the problem from the training perspective, and propose a new training scheme called Adversarial Training on EXplanations (ATEX) to improve the internal explanation stability of a model regardless of the specific explanation method being applied. Instead of directly specifying explanation values over data instances, ATEX only puts requirement on model predictions which avoids involving second-order derivatives in optimization. As a further discussion, we also find that explanation stability is closely related to another property of the model, i.e., the risk of being exposed to adversarial attack. Through experiments, besides showing that ATEX improves model robustness against manipulation targeting explanation, it also brings additional benefits including smoothing explanations and improving the efficacy of adversarial training if applied to the model.

【8】 Identifying Best Fair Intervention 标题:确定最佳公平干预 链接:https://arxiv.org/abs/2111.04272

作者:Ruijiang Gao,Han Feng 摘要:We study the problem of best arm identification with a fairness constraint in a given causal model. The goal is to find a soft intervention on a given node to maximize the outcome while meeting a fairness constraint by counterfactual estimation with only partial knowledge of the causal model. The problem is motivated by ensuring fairness on an online marketplace. We provide theoretical guarantees on the probability of error and empirically examine the effectiveness of our algorithm with a two-stage baseline.

【9】 Personalized Benchmarking with the Ludwig Benchmarking Toolkit 标题:使用路德维希基准测试工具包进行个性化基准测试 链接:https://arxiv.org/abs/2111.04260

作者:Avanika Narayan,Piero Molino,Karan Goel,Willie Neiswanger,Christopher Ré 机构:Department of Computer Science, Stanford University 备注:14 pages, 14 figures, 35th Conference on Neural Information Processing Systems (NeurIPS 2021) Track on Datasets and Benchmarks 摘要:The rapid proliferation of machine learning models across domains and deployment settings has given rise to various communities (e.g. industry practitioners) which seek to benchmark models across tasks and objectives of personal value. Unfortunately, these users cannot use standard benchmark results to perform such value-driven comparisons as traditional benchmarks evaluate models on a single objective (e.g. average accuracy) and fail to facilitate a standardized training framework that controls for confounding variables (e.g. computational budget), making fair comparisons difficult. To address these challenges, we introduce the open-source Ludwig Benchmarking Toolkit (LBT), a personalized benchmarking toolkit for running end-to-end benchmark studies (from hyperparameter optimization to evaluation) across an easily extensible set of tasks, deep learning models, datasets and evaluation metrics. LBT provides a configurable interface for controlling training and customizing evaluation, a standardized training framework for eliminating confounding variables, and support for multi-objective evaluation. We demonstrate how LBT can be used to create personalized benchmark studies with a large-scale comparative analysis for text classification across 7 models and 9 datasets. We explore the trade-offs between inference latency and performance, relationships between dataset attributes and performance, and the effects of pretraining on convergence and robustness, showing how LBT can be used to satisfy various benchmarking objectives.

【10】 A Novel Data Pre-processing Technique: Making Data Mining Robust to Different Units and Scales of Measurement 标题:一种新的数据预处理技术:使数据挖掘对不同测量单位和尺度具有鲁棒性 链接:https://arxiv.org/abs/2111.04253

作者:Arbind Agrahari Baniya,Sunil Aryal,Santosh KC 机构:Deakin University, Geelong, Victoria, Australia, University of South Dakota, Vermillion, SD, USA 备注:None 摘要:Many existing data mining algorithms use feature values directly in their model, making them sensitive to units/scales used to measure/represent data. Pre-processing of data based on rank transformation has been suggested as a potential solution to overcome this issue. However, the resulting data after pre-processing with rank transformation is uniformly distributed, which may not be very useful in many data mining applications. In this paper, we present a better and effective alternative based on ranks over multiple sub-samples of data. We call the proposed pre-processing technique as ARES | Average Rank over an Ensemble of Sub-samples. Our empirical results of widely used data mining algorithms for classification and anomaly detection in a wide range of data sets suggest that ARES results in more consistent task specific? outcome across various algorithms and data sets. In addition to this, it results in better or competitive outcome most of the time compared to the most widely used min-max normalisation and the traditional rank transformation.

【11】 VizAI : Selecting Accurate Visualizations of Numerical Data 标题:VizAI:选择准确的数字数据可视化 链接:https://arxiv.org/abs/2111.04190

作者:Ritvik Vij,Rohit Raj,Madhur Singhal,Manish Tanwar,Srikanta Bedathur 机构:Department of CSE, IIT Delhi, India 备注:Proc. of the ACM India Joint International Conference on Data Sciences and Management of Data (CODS-COMAD) 2022 (9th ACM IKDD CODS and 27th COMAD) - To Appear 摘要:A good data visualization is not only a distortion-free graphical representation of data but also a way to reveal underlying statistical properties of the data. Despite its common use across various stages of data analysis, selecting a good visualization often is a manual process involving many iterations. Recently there has been interest in reducing this effort by developing models that can recommend visualizations, but they are of limited use since they require large training samples (data and visualization pairs) and focus primarily on the design aspects rather than on assessing the effectiveness of the selected visualization. In this paper, we present VizAI, a generative-discriminative framework that first generates various statistical properties of the data from a number of alternative visualizations of the data. It is linked to a discriminative model that selects the visualization that best matches the true statistics of the data being visualized. VizAI can easily be trained with minimal supervision and adapts to settings with varying degrees of supervision easily. Using crowd-sourced judgements and a large repository of publicly available visualizations, we demonstrate that VizAI outperforms the state of the art methods that learn to recommend visualizations.

【12】 NLP From Scratch Without Large-Scale Pretraining: A Simple and Efficient Framework 标题:无需大规模预训练的NLP白手起家:一种简单高效的框架 链接:https://arxiv.org/abs/2111.04130

作者:Xingcheng Yao,Yanan Zheng,Xiaocong Yang,Zhilin Yang 机构: Institute for Interdisciplinary Information Sciences, Tsinghua University, Department of Computer Science and Technology, Tsinghua University, School of Economics and Management, Tsinghua University, Shanghai Qi Zhi Institute, Recurrent AI, Inc 备注:13 pages, 5 figures 摘要:Pretrained language models have become the standard approach for many NLP tasks due to strong performance, but they are very expensive to train. We propose a simple and efficient learning framework, TLM, that does not rely on large-scale pretraining. Given some labeled task data and a large general corpus, TLM uses task data as queries to retrieve a tiny subset of the general corpus and jointly optimizes the task objective and the language modeling objective from scratch. On eight classification datasets in four domains, TLM achieves results better than or similar to pretrained language models (e.g., RoBERTa-Large) while reducing the training FLOPs by two orders of magnitude. With high accuracy and efficiency, we hope TLM will contribute to democratizing NLP and expediting its development.

【13】 Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias 标题:潜在混杂因素和选择偏差可能存在的迭代因果发现 链接:https://arxiv.org/abs/2111.04095

作者:Raanan Y. Rohekar,Shami Nisimov,Yaniv Gurwicz,Gal Novik 机构:Intel Labs 备注:35th Conference on Neural Information Processing Systems (NeurIPS 2021). arXiv admin note: text overlap with arXiv:2012.07513 摘要:We present a sound and complete algorithm, called iterative causal discovery (ICD), for recovering causal graphs in the presence of latent confounders and selection bias. ICD relies on the causal Markov and faithfulness assumptions and recovers the equivalence class of the underlying causal graph. It starts with a complete graph, and consists of a single iterative stage that gradually refines this graph by identifying conditional independence (CI) between connected nodes. Independence and causal relations entailed after any iteration are correct, rendering ICD anytime. Essentially, we tie the size of the CI conditioning set to its distance on the graph from the tested nodes, and increase this value in the successive iteration. Thus, each iteration refines a graph that was recovered by previous iterations having smaller conditioning sets -- a higher statistical power -- which contributes to stability. We demonstrate empirically that ICD requires significantly fewer CI tests and learns more accurate causal graphs compared to FCI, FCI+, and RFCI algorithms.

【14】 High Performance Out-of-sample Embedding Techniques for Multidimensional Scaling 标题:用于多维缩放的高性能样本外嵌入技术 链接:https://arxiv.org/abs/2111.04067

作者:Samudra Herath,Matthew Roughan,Gary Glonek 机构: University of Adelaide 摘要:The recent rapid growth of the dimension of many datasets means that many approaches to dimension reduction (DR) have gained significant attention. High-performance DR algorithms are required to make data analysis feasible for big and fast data sets. However, many traditional DR techniques are challenged by truly large data sets. In particular multidimensional scaling (MDS) does not scale well. MDS is a popular group of DR techniques because it can perform DR on data where the only input is a dissimilarity function. However, common approaches are at least quadratic in memory and computation and, hence, prohibitive for large-scale data. We propose an out-of-sample embedding (OSE) solution to extend the MDS algorithm for large-scale data utilising the embedding of only a subset of the given data. We present two OSE techniques: the first based on an optimisation approach and the second based on a neural network model. With a minor trade-off in the approximation, the out-of-sample techniques can process large-scale data with reasonable computation and memory requirements. While both methods perform well, the neural network model outperforms the optimisation approach of the OSE solution in terms of efficiency. OSE has the dual benefit that it allows fast DR on streaming datasets as well as static databases.

【15】 Quasi-potential theory for escape problem: Quantitative sharpness effect on SGD's escape from local minima 标题:逃逸问题的拟势理论:定量锐度对SGD逃逸局部极小值的影响 链接:https://arxiv.org/abs/2111.04004

作者:Hikaru Ibayashi,Masaaki Imaizumi 机构:University of Southern California, The University of Tokyo 摘要:We develop a quantitative theory on an escape problem of a stochastic gradient descent (SGD) algorithm and investigate the effect of sharpness of loss surfaces on the escape. Deep learning has achieved tremendous success in various domains, however, it has opened up various theoretical open questions. One of the typical questions is why an SGD can find parameters that generalize well over non-convex loss surfaces. An escape problem is an approach to tackle this question, which investigates how efficiently an SGD escapes from local minima. In this paper, we develop a quasi-potential theory for the escape problem, by applying a theory of stochastic dynamical systems. We show that the quasi-potential theory can handle both geometric properties of loss surfaces and a covariance structure of gradient noise in a unified manner, while they have been separately studied in previous works. Our theoretical results imply that (i) the sharpness of loss surfaces contributes to the slow escape of an SGD, and (ii) the SGD's noise structure cancels the effect and exponentially accelerates the escape. We also conduct experiments to empirically validate our theory using neural networks trained with real data.

【16】 NarrationBot and InfoBot: A Hybrid System for Automated Video Description 标题:NarariBot和InfoBot:一种视频自动描述的混合系统 链接:https://arxiv.org/abs/2111.03994

作者:Shasta Ihorn,Yue-Ting Siu,Aditya Bodi,Lothar Narins,Jose M. Castanon,Yash Kant,Abhishek Das,Ilmi Yoon,Pooyan Fazli 机构:San Francisco State University, San Francisco, CA, United States, Georgia Tech, Atlanta, GA, United States, Facebook AI Research, Menlo Park, CA, United States 备注:14 pages 摘要:Video accessibility is crucial for blind and low vision users for equitable engagements in education, employment, and entertainment. Despite the availability of professional and amateur services and tools, most human-generated descriptions are expensive and time consuming. Moreover, the rate of human-generated descriptions cannot match the speed of video production. To overcome the increasing gaps in video accessibility, we developed a hybrid system of two tools to 1) automatically generate descriptions for videos and 2) provide answers or additional descriptions in response to user queries on a video. Results from a mixed-methods study with 26 blind and low vision individuals show that our system significantly improved user comprehension and enjoyment of selected videos when both tools were used in tandem. In addition, participants reported no significant difference in their ability to understand videos when presented with autogenerated descriptions versus human-revised autogenerated descriptions. Our results demonstrate user enthusiasm about the developed system and its promise for providing customized access to videos. We discuss the limitations of the current work and provide recommendations for the future development of automated video description tools.

【17】 Proposing an Interactive Audit Pipeline for Visual Privacy Research 标题:提出一种用于视觉隐私研究的交互式审计流水线 链接:https://arxiv.org/abs/2111.03984

作者:Jasmine DeHart,Chenguang Xu,Lisa Egede,Christan Grant 机构:† School of Computer Science, University of Oklahoma, ⋆ Human Computer Interaction Institute, Carnegie Mellon University 备注:Extended version of IEEE BigData 2021 Short Paper, 15 pages 摘要:In an ideal world, deployed machine learning models will enhance our society. We hope that those models will provide unbiased and ethical decisions that will benefit everyone. However, this is not always the case; issues arise from the data curation process to the models' deployment. The continued use of biased datasets and processes will adversely damage communities and increase the cost to fix the problem. In this work, we walk through the decision process that a researcher will need to make before, during, and after their project to consider the broader impacts of research and the community. Throughout this paper, we observe the critical decisions that are often overlooked when deploying AI, argue for the use of fairness forensics to discover bias and fairness issues in systems, assert the need for a responsible human-over-the-loop to bring accountability into the deployed system, and finally, reflect on the need to explore research agendas that have harmful societal impacts. We examine visual privacy research and draw lessons that can apply broadly to Artificial Intelligence. Our goal is to provide a systematic analysis of the machine learning pipeline for visual privacy and bias issues. With this pipeline, we hope to raise stakeholder (e.g., researchers, modelers, corporations) awareness as these issues propagate in the various machine learning phases.

【18】 CALText: Contextual Attention Localization for Offline Handwritten Text 标题:CALText:脱机手写文本的上下文注意定位 链接:https://arxiv.org/abs/2111.03952

作者:Tayaba Anjum,Nazar Khan 机构:Department of Computer Science, University of the Punjab, Lahore, Pakistan 备注:25 pages, 15 figures and 6 tables 摘要:Recognition of Arabic-like scripts such as Persian and Urdu is more challenging than Latin-based scripts. This is due to the presence of a two-dimensional structure, context-dependent character shapes, spaces and overlaps, and placement of diacritics. Not much research exists for offline handwritten Urdu script which is the 10th most spoken language in the world. We present an attention based encoder-decoder model that learns to read Urdu in context. A novel localization penalty is introduced to encourage the model to attend only one location at a time when recognizing the next character. In addition, we comprehensively refine the only complete and publicly available handwritten Urdu dataset in terms of ground-truth annotations. We evaluate the model on both Urdu and Arabic datasets and show that contextual attention localization outperforms both simple attention and multi-directional LSTM models.

【19】 Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods 标题:策略梯度方法的时间离散化-不变安全动作重复 链接:https://arxiv.org/abs/2111.03941

作者:Seohong Park,Jaekyeom Kim,Gunhee Kim 机构:Seoul National University 摘要:In reinforcement learning, continuous time is often discretized by a time scale $\delta$, to which the resulting performance is known to be highly sensitive. In this work, we seek to find a $\delta$-invariant algorithm for policy gradient (PG) methods, which performs well regardless of the value of $\delta$. We first identify the underlying reasons that cause PG methods to fail as $\delta \to 0$, proving that the variance of the PG estimator can diverge to infinity in stochastic environments under a certain assumption of stochasticity. While durative actions or action repetition can be employed to have $\delta$-invariance, previous action repetition methods cannot immediately react to unexpected situations in stochastic environments. We thus propose a novel $\delta$-invariant method named Safe Action Repetition (SAR) applicable to any existing PG algorithm. SAR can handle the stochasticity of environments by adaptively reacting to changes in states during action repetition. We empirically show that our method is not only $\delta$-invariant but also robust to stochasticity, outperforming previous $\delta$-invariant approaches on eight MuJoCo environments with both deterministic and stochastic settings. Our code is available at https://vision.snu.ac.kr/projects/sar.

【20】 Convolutional Gated MLP: Combining Convolutions & gMLP 标题:卷积门控MLP:结合卷积和gMLP 链接:https://arxiv.org/abs/2111.03940

作者:A. Rajagopal,V. Nirmala 备注:Conference 摘要:To the best of our knowledge, this is the first paper to introduce Convolutions to Gated MultiLayer Perceptron and contributes an implementation of this novel Deep Learning architecture. Google Brain introduced the gMLP in May 2021. Microsoft introduced Convolutions in Vision Transformer in Mar 2021. Inspired by both gMLP and CvT, we introduce convolutional layers in gMLP. CvT combined the power of Convolutions and Attention. Our implementation combines the best of Convolutional learning along with spatial gated MLP. Further, the paper visualizes how CgMLP learns. Visualizations show how CgMLP learns from features such as outline of a car. While Attention was the basis of much of recent progress in Deep Learning, gMLP proposed an approach that doesn't use Attention computation. In Transformer based approaches, a whole lot of Attention matrixes need to be learnt using vast amount of training data. In gMLP, the fine tunning for new tasks can be challenging by transfer learning with smaller datasets. We implement CgMLP and compares it with gMLP on CIFAR dataset. Experimental results explore the power of generaliza-tion of CgMLP, while gMLP tend to drastically overfit the training data. To summarize, the paper contributes a novel Deep Learning architecture and demonstrates the learning mechanism of CgMLP through visualizations, for the first time in literature.

【21】 SOPE: Spectrum of Off-Policy Estimators 标题:SOPE:非政策估计器的光谱 链接:https://arxiv.org/abs/2111.03936

作者:Christina J. Yuan,Yash Chandak,Stephen Giguere,Philip S. Thomas,Scott Niekum 机构:University of Texas at Austin, University of Massachusetts 摘要:Many sequential decision making problems are high-stakes and require off-policy evaluation (OPE) of a new policy using historical data collected using some other policy. One of the most common OPE techniques that provides unbiased estimates is trajectory based importance sampling (IS). However, due to the high variance of trajectory IS estimates, importance sampling methods based on state-action visitation distributions (SIS) have recently been adopted. Unfortunately, while SIS often provides lower variance estimates for long horizons, estimating the state-action distribution ratios can be challenging and lead to biased estimates. In this paper, we present a new perspective on this bias-variance trade-off and show the existence of a spectrum of estimators whose endpoints are SIS and IS. Additionally, we also establish a spectrum for doubly-robust and weighted version of these estimators. We provide empirical evidence that estimators in this spectrum can be used to trade-off between the bias and variance of IS and SIS and can achieve lower mean-squared error than both IS and SIS.

【22】 TND-NAS: Towards Non-differentiable Objectives in Progressive Differentiable NAS Framework 标题:TND-NAS:渐进式可区分NAS框架中的不可区分目标 链接:https://arxiv.org/abs/2111.03892

作者:Bo Lyu,Shiping Wen,Zheng Yan,Kaibo Shi,Ke Li,Tingwen Huang 摘要:Differentiable architecture search has gradually become the mainstream research topic in the field of Neural Architecture Search (NAS) for its capability to improve efficiency compared with the early NAS (EA-based, RL-based) methods. Recent differentiable NAS also aims at further improving search efficiency, reducing the GPU-memory consumption, and addressing the "depth gap" issue. However, these methods are no longer capable of tackling the non-differentiable objectives, let alone multi-objectives, e.g., performance, robustness, efficiency, and other metrics. We propose an end-to-end architecture search framework towards non-differentiable objectives, TND-NAS, with the merits of the high efficiency in differentiable NAS framework and the compatibility among non-differentiable metrics in Multi-objective NAS (MNAS). Under differentiable NAS framework, with the continuous relaxation of the search space, TND-NAS has the architecture parameters ($\alpha$) been optimized in discrete space, while resorting to the search policy of progressively shrinking the supernetwork by $\alpha$. Our representative experiment takes two objectives (Parameters, Accuracy) as an example, we achieve a series of high-performance compact architectures on CIFAR10 (1.09M/3.3%, 2.4M/2.95%, 9.57M/2.54%) and CIFAR100 (2.46M/18.3%, 5.46/16.73%, 12.88/15.20%) datasets. Favorably, under real-world scenarios (resource-constrained, platform-specialized), the Pareto-optimal solutions can be conveniently reached by TND-NAS.

【23】 What augmentations are sensitive to hyper-parameters and why? 标题:哪些增量对超参数敏感?为什么? 链接:https://arxiv.org/abs/2111.03861

作者:Ch Muhammad Awais,Imad Eddine Ibrahim Bekkouch 机构:Machine Learning and, Knowledge Representation Lab, Innopolis University, Innopolis, Russia, BEKKOUCH Imad Eddine Ibrahim, Sorbonne Center for Artificial, Intelligence - SCAI, Sorbonne University, Paris, France 备注:10 pages, 17 figures 摘要:We apply augmentations to our dataset to enhance the quality of our predictions and make our final models more resilient to noisy data and domain drifts. Yet the question remains, how are these augmentations going to perform with different hyper-parameters? In this study we evaluate the sensitivity of augmentations with regards to the model's hyper parameters along with their consistency and influence by performing a Local Surrogate (LIME) interpretation on the impact of hyper-parameters when different augmentations are applied to a machine learning model. We have utilized Linear regression coefficients for weighing each augmentation. Our research has proved that there are some augmentations which are highly sensitive to hyper-parameters and others which are more resilient and reliable.

【24】 Focusing on Possible Named Entities in Active Named Entity Label Acquisition 标题:关注活动命名实体标签获取中可能的命名实体 链接:https://arxiv.org/abs/2111.03837

作者:Ali Osman Berk Sapci,Oznur Tastan,Reyyan Yeniterzi 机构:Sabancı University, Co-corresponding authors. 备注:20 pages, 8 figures 摘要:Named entity recognition (NER) aims to identify mentions of named entities in an unstructured text and classify them into the predefined named entity classes. Even though deep learning-based pre-trained language models achieve good predictive performances, many domain-specific NERtasks still require a sufficient amount of labeled data. Active learning (AL), a general framework for the label acquisition problem, has been used for the NER tasks to minimize the annotation cost without sacrificing model performance. However, heavily imbalanced class distribution of tokens introduces challenges in designing effective AL querying methods for NER. We propose AL sentence query evaluation functions which pay more attention to possible positive tokens, and evaluate these proposed functions with both sentence-based and token-based cost evaluation strategies. We also propose a better data-driven normalization approach to penalize too long or too short sentences. Our experiments on three datasets from different domains reveal that the proposed approaches reduce the number of annotated tokens while achieving better or comparable prediction performance with conventional methods.

【25】 Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems 标题:非定常线性动态系统控制的动态遗憾最小化 链接:https://arxiv.org/abs/2111.03772

作者:Yuwei Luo,Varun Gupta,Mladen Kolar 摘要:We consider the problem of controlling a Linear Quadratic Regulator (LQR) system over a finite horizon $T$ with fixed and known cost matrices $Q,R$, but unknown and non-stationary dynamics $\{A_t, B_t\}$. The sequence of dynamics matrices can be arbitrary, but with a total variation, $V_T$, assumed to be $o(T)$ and unknown to the controller. Under the assumption that a sequence of stabilizing, but potentially sub-optimal controllers is available for all $t$, we present an algorithm that achieves the optimal dynamic regret of $\tilde{\mathcal{O}}\left(V_T^{2/5}T^{3/5}\right)$. With piece-wise constant dynamics, our algorithm achieves the optimal regret of $\tilde{\mathcal{O}}(\sqrt{ST})$ where $S$ is the number of switches. The crux of our algorithm is an adaptive non-stationarity detection strategy, which builds on an approach recently developed for contextual Multi-armed Bandit problems. We also argue that non-adaptive forgetting (e.g., restarting or using sliding window learning with a static window size) may not be regret optimal for the LQR problem, even when the window size is optimally tuned with the knowledge of $V_T$. The main technical challenge in the analysis of our algorithm is to prove that the ordinary least squares (OLS) estimator has a small bias when the parameter to be estimated is non-stationary. Our analysis also highlights that the key motif driving the regret is that the LQR problem is in spirit a bandit problem with linear feedback and locally quadratic cost. This motif is more universal than the LQR problem itself, and therefore we believe our results should find wider application.

【26】 An Algorithmic Theory of Metacognition in Minds and Machines 标题:心理与机器元认知的算法理论 链接:https://arxiv.org/abs/2111.03745

作者:Rylan Schaeffer 机构:Department of Computer Science, Stanford University 摘要:Humans sometimes choose actions that they themselves can identify as sub-optimal, or wrong, even in the absence of additional information. How is this possible? We present an algorithmic theory of metacognition based on a well-understood trade-off in reinforcement learning (RL) between value-based RL and policy-based RL. To the cognitive (neuro)science community, our theory answers the outstanding question of why information can be used for error detection but not for action selection. To the machine learning community, our proposed theory creates a novel interaction between the Actor and Critic in Actor-Critic agents and notes a novel connection between RL and Bayesian Optimization. We call our proposed agent the Metacognitive Actor Critic (MAC). We conclude with showing how to create metacognition in machines by implementing a deep MAC and showing that it can detect (some of) its own suboptimal actions without external information or delay.

【27】 Increasing Data Diversity with Iterative Sampling to Improve Performance 标题:通过迭代采样提高数据多样性以提高性能 链接:https://arxiv.org/abs/2111.03743

作者:Devrim Cavusoglu,Ogulcan Eryuksel,Sinan Altinuc 机构:O˘gulcan Eryuksel, OBSS AI 备注:5 pages, 2 (6) figures, to be published in 1st NeurIPS Data-Centric AI Workshop 摘要:As a part of the Data-Centric AI Competition, we propose a data-centric approach to improve the diversity of the training samples by iterative sampling. The method itself relies strongly on the fidelity of augmented samples and the diversity of the augmentation methods. Moreover, we improve the performance further by introducing more samples for the difficult classes especially providing closer samples to edge cases potentially those the model at hand misclassifies.

【28】 Sharp Bounds for Federated Averaging (Local SGD) and Continuous Perspective 标题:联合平均(局部SGD)和连续透视的锐界 链接:https://arxiv.org/abs/2111.03741

作者:Margalit Glasgow,Honglin Yuan,Tengyu Ma 机构:Stanford University 备注:47 pages. First two authors contributed equally 摘要:Federated Averaging (FedAvg), also known as Local SGD, is one of the most popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the convergence rate of FedAvg has thus far been undetermined. Even under the simplest assumptions (convex, smooth, homogeneous, and bounded covariance), the best-known upper and lower bounds do not match, and it is not clear whether the existing analysis captures the capacity of the algorithm. In this work, we first resolve this question by providing a lower bound for FedAvg that matches the existing upper bound, which shows the existing FedAvg upper bound analysis is not improvable. Additionally, we establish a lower bound in a heterogeneous setting that nearly matches the existing upper bound. While our lower bounds show the limitations of FedAvg, under an additional assumption of third-order smoothness, we prove more optimistic state-of-the-art convergence results in both convex and non-convex settings. Our analysis stems from a notion we call iterate bias, which is defined by the deviation of the expectation of the SGD trajectory from the noiseless gradient descent trajectory with the same initialization. We prove novel sharp bounds on this quantity, and show intuitively how to analyze this quantity from a Stochastic Differential Equation (SDE) perspective.

【29】 Inertial Newton Algorithms Avoiding Strict Saddle Points 标题:避免严格鞍点的惯性牛顿算法 链接:https://arxiv.org/abs/2111.04596

作者:Camille Castera 机构:CNRS - IRIT, Universit´e de Toulouse, Toulouse, France 摘要:We study the asymptotic behavior of second-order algorithms mixing Newton's method and inertial gradient descent in non-convex landscapes. We show that, despite the Newtonian behavior of these methods, they almost always escape strict saddle points. We also evidence the role played by the hyper-parameters of these methods in their qualitative behavior near critical points. The theoretical results are supported by numerical illustrations.

【30】 Fast and Scalable Spike and Slab Variable Selection in High-Dimensional Gaussian Processes 标题:高维高斯过程中快速可扩展的尖峰和板状变量选择 链接:https://arxiv.org/abs/2111.04558

作者:Hugh Dance,Brooks Paige 机构:University College London 摘要:Variable selection in Gaussian processes (GPs) is typically undertaken by thresholding the inverse lengthscales of `automatic relevance determination' kernels, but in high-dimensional datasets this approach can be unreliable. A more probabilistically principled alternative is to use spike and slab priors and infer a posterior probability of variable inclusion. However, existing implementations in GPs are extremely costly to run in both high-dimensional and large-$n$ datasets, or are intractable for most kernels. As such, we develop a fast and scalable variational inference algorithm for the spike and slab GP that is tractable with arbitrary differentiable kernels. We improve our algorithm's ability to adapt to the sparsity of relevant variables by Bayesian model averaging over hyperparameters, and achieve substantial speed ups using zero temperature posterior restrictions, dropout pruning and nearest neighbour minibatching. In experiments our method consistently outperforms vanilla and sparse variational GPs whilst retaining similar runtimes (even when $n=10^6$) and performs competitively with a spike and slab GP using MCMC but runs up to $1000$ times faster.

【31】 Deep Neyman-Scott Processes 标题:深Neyman-Scott过程 链接:https://arxiv.org/abs/2111.03949

作者:Chengkuan Hong,Christian R. Shelton 机构:Department of Computer Science and Engineering, University of California, Riverside 摘要:A Neyman-Scott process is a special case of a Cox process. The latent and observable stochastic processes are both Poisson processes. We consider a deep Neyman-Scott process in this paper, for which the building components of a network are all Poisson processes. We develop an efficient posterior sampling via Markov chain Monte Carlo and use it for likelihood-based inference. Our method opens up room for the inference in sophisticated hierarchical point processes. We show in the experiments that more hidden Poisson processes brings better performance for likelihood fitting and events types prediction. We also compare our method with state-of-the-art models for temporal real-world datasets and demonstrate competitive abilities for both data fitting and prediction, using far fewer parameters.

【32】 First steps on Gamification of Lung Fluid Cells Annotations in the Flower Domain 标题:花域中肺液细胞注释游戏化的第一步 链接:https://arxiv.org/abs/2111.03663

作者:Sonja Kunzmann,Christian Marzahl,Felix Denzinger,Christof A. Bertram,Robert Klopfleisch,Katharina Breininger,Vincent Christlein,Andreas Maier 机构:Pattern Recognition Lab, Friedrich-Alexander Universit¨at Erlangen-N¨urnberg, Erlangen, Germany, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Institute of Pathology, University of Verterinary Medicine, Vienna, Austria 备注:6 pages, 4 figures 摘要:Annotating data, especially in the medical domain, requires expert knowledge and a lot of effort. This limits the amount and/or usefulness of available medical data sets for experimentation. Therefore, developing strategies to increase the number of annotations while lowering the needed domain knowledge is of interest. A possible strategy is the use of gamification, that is i.e. transforming the annotation task into a game. We propose an approach to gamify the task of annotating lung fluid cells from pathological whole slide images. As this domain is unknown to non-expert annotators, we transform images of cells detected with a RetinaNet architecture to the domain of flower images. This domain transfer is performed with a CycleGAN architecture for different cell types. In this more assessable domain, non-expert annotators can be (t)asked to annotate different kinds of flowers in a playful setting. In order to provide a proof of concept, this work shows that the domain transfer is possible by evaluating an image classification network trained on real cell images and tested on the cell images generated by the CycleGAN network. The classification network reaches an accuracy of 97.48% and 95.16% on the original lung fluid cells and transformed lung fluid cells, respectively. With this study, we lay the foundation for future research on gamification using CycleGANs.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-11-09,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
数据库
云数据库为企业提供了完善的关系型数据库、非关系型数据库、分析型数据库和数据库生态工具。您可以通过产品选择和组合搭建,轻松实现高可靠、高可用性、高性能等数据库需求。云数据库服务也可大幅减少您的运维工作量,更专注于业务发展,让企业一站式享受数据上云及分布式架构的技术红利!
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档