机器学习学术速递[7.26]

公众号-arXiv每日学术速递

发布于 2021-07-27 11:17:51

1.5K0

发布于 2021-07-27 11:17:51

文章被收录于专栏：arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.LG 方向，今日共计69篇

Graph相关(图学习|图神经网络|图优化等)(3篇)

【1】 Structack: Structure-based Adversarial Attacks on Graph Neural Networks 标题：Structack：基于结构的图神经网络敌意攻击

作者：Hussain Hussain,Tomislav Duricic,Elisabeth Lex,Denis Helic,Markus Strohmaier,Roman Kern 机构：Graz University of Technology, Austria, Know Center GmbH, aachen.de, RWTH Aachen, Germany, GESIS - Leibniz Institute for the, Social Sciences 备注：Accepted as a full paper at ACM Hypertext on July 9, 2021 链接：https://arxiv.org/abs/2107.11327 摘要：最近的研究表明，图形神经网络（GNNs）容易受到图形数据的攻击。常见的攻击方法通常是通知的，即它们可以访问有关节点属性（如标签和特征向量）的信息。在这项工作中，我们研究了不知情的对抗性攻击，其中攻击者只能访问图结构，但没有关于节点属性的信息。在这里，攻击者的目标是利用GNN模型对图形数据所做的结构知识和假设。特别是，文献表明，结构节点的中心性和相似性对GNNs的学习有很大的影响。因此，我们研究了中心性和相似性对GNNs对抗性攻击的影响。我们证明了攻击者可以利用这些信息来降低GNNs的性能，方法是在低相似性和低中心性的节点之间注入链接。我们证明了基于结构的无信息攻击可以接近有信息攻击的性能，同时计算效率更高。本文提出了一种新的GNNs攻击策略，称之为Structack。Structack可以在有限的信息下成功地操纵GNNs的性能，同时在严格的计算约束下运行。我们的工作有助于在图上建立更健壮的机器学习方法。摘要：Recent work has shown that graph neural networks (GNNs) are vulnerable to adversarial attacks on graph data. Common attack approaches are typically informed, i.e. they have access to information about node attributes such as labels and feature vectors. In this work, we study adversarial attacks that are uninformed, where an attacker only has access to the graph structure, but no information about node attributes. Here the attacker aims to exploit structural knowledge and assumptions, which GNN models make about graph data. In particular, literature has shown that structural node centrality and similarity have a strong influence on learning with GNNs. Therefore, we study the impact of centrality and similarity on adversarial attacks on GNNs. We demonstrate that attackers can exploit this information to decrease the performance of GNNs by focusing on injecting links between nodes of low similarity and, surprisingly, low centrality. We show that structure-based uninformed attacks can approach the performance of informed attacks, while being computationally more efficient. With our paper, we present a new attack strategy on GNNs that we refer to as Structack. Structack can successfully manipulate the performance of GNNs with very limited information while operating under tight computational constraints. Our work contributes towards building more robust machine learning approaches on graphs.

【2】 Human Pose Estimation from Sparse Inertial Measurements through Recurrent Graph Convolution 标题：基于递归图卷积法的稀疏惯性测量人体姿态估计

作者：Patrik Puchert,Timo Ropinski 机构：Ulm University, Ulm, Germany 链接：https://arxiv.org/abs/2107.11214 摘要：本文提出了一种邻接自适应图卷积长短时记忆网络（AAGC-LSTM），用于稀疏惯性测量中人体姿态的估计。AAGC-LSTM在单个网络操作中结合了空间和时间依赖性。这是可能的装备图卷积邻接自适应，这也允许学习未知的依赖人体关节。为了进一步提高准确性，我们建议纵向损失加权考虑自然运动模式，以及身体感知对侧数据增强。通过将这些贡献结合起来，我们能够利用人体固有的图形特性，并且因此可以在稀疏惯性测量的人体姿态估计方面优于现有技术。摘要：We propose the adjacency adaptive graph convolutional long-short term memory network (AAGC-LSTM) for human pose estimation from sparse inertial measurements, obtained from only 6 measurement units. The AAGC-LSTM combines both spatial and temporal dependency in a single network operation. This is made possible by equipping graph convolutions with adjacency adaptivity, which also allows for learning unknown dependencies of the human body joints. To further boost accuracy, we propose longitudinal loss weighting to consider natural movement patterns, as well as body-aware contralateral data augmentation. By combining these contributions, we are able to utilize the inherent graph nature of the human body, and can thus outperform the state of the art for human pose estimation from sparse inertial measurements.

【3】 Ego-GNNs: Exploiting Ego Structures in Graph Neural Networks 标题：EGO-GNNs：利用图神经网络中的EGO结构

作者：Dylan Sandfelder,Priyesh Vijayan,William L. Hamilton 机构：McGill University, Mila - Quebec AI Institute 备注：None 链接：https://arxiv.org/abs/2107.10957 摘要：图形神经网络（GNNs）作为一种对图形结构数据进行深度学习的框架已经取得了显著的成功。然而，GNNs基本上受到树结构归纳偏差的限制：WL子树核公式限制了GNNs的表示能力，多项式时间GNNs不能识别图中的三角形。在这项工作中，我们建议用定义在ego图（即，围绕每个节点的诱导子图）上的信息来扩充GNN消息传递操作。我们将这些方法称为Ego-GNNs，并证明Ego-GNNs比标准消息传递GNNs更强大。特别地，我们证明了Ego-GNNs能够识别封闭的三角形，这是必要的，因为在现实世界中的图形传递性的突出。我们也从多重图卷积的图形信号处理的角度出发来激励我们的方法。在使用合成数据和真实数据进行节点分类的实验结果突出了使用该方法可以获得的性能增益。摘要：Graph neural networks (GNNs) have achieved remarkable success as a framework for deep learning on graph-structured data. However, GNNs are fundamentally limited by their tree-structured inductive bias: the WL-subtree kernel formulation bounds the representational capacity of GNNs, and polynomial-time GNNs are provably incapable of recognizing triangles in a graph. In this work, we propose to augment the GNN message-passing operations with information defined on ego graphs (i.e., the induced subgraph surrounding each node). We term these approaches Ego-GNNs and show that Ego-GNNs are provably more powerful than standard message-passing GNNs. In particular, we show that Ego-GNNs are capable of recognizing closed triangles, which is essential given the prominence of transitivity in real-world graphs. We also motivate our approach from the perspective of graph signal processing as a form of multiplex graph convolution. Experimental results on node classification using synthetic and real data highlight the achievable performance gains using this approach.

Transformer(1篇)

【1】 Tsformer: Time series Transformer for tourism demand forecasting 标题：Tsformer：用于旅游需求预测的时间序列转换器

作者：Siyuan Yi,Xing Chen,Chuanming Tang 机构：Chengdu University of Technology ,Chengdu, China, b Key Laboratory of Optical Engineering, Institute of Optics and Electronics, Chinese Academy of Sciences, Chengdu , China 链接：https://arxiv.org/abs/2107.10977 摘要：人工智能方法在旅游需求预测中得到了广泛的应用。然而，目前基于人工智能的方法缺乏处理长期依赖关系的能力，且大多缺乏可解释性。最初用于机器翻译的转换器显示了一种难以置信的处理长期依赖关系的能力。在此基础上，提出了一种具有编解码结构的旅游需求预测时序转换器（Tsformer）。该模型通过一系列的注意掩蔽机制，在突出显性注意的前提下，用编码器对长期依赖进行编码，用解码器捕捉短期依赖，简化注意交互。这些改进使得多头注意机制能够根据时间关系对输入序列进行处理，从而提高了系统的可解释性。此外，编码器-解码器架构的上下文处理能力允许采用要预测的日历来提高预测性能。在九寨沟和四姑娘山的旅游需求数据集上，用其他9种基线方法进行的实验表明，该方法在短期和长期的旅游需求预测任务中均优于所有的基线模型。此外，消融研究表明，采用日历的天数预测有助于预测性能的建议Tsformer。为了更好的解释性，注意力权重矩阵进行可视化。结果表明，在短期预报中，tsp模型主要集中在季节特征和接近预报日的天数上。摘要：AI-based methods have been widely applied to tourism demand forecasting. However, current AI-based methods are short of the ability to process long-term dependency, and most of them lack interpretability. The Transformer used initially for machine translation shows an incredible ability to long-term dependency processing. Based on the Transformer, we proposed a time series Transformer (Tsformer) with Encoder-Decoder architecture for tourism demand forecasting. The proposed Tsformer encodes long-term dependency with encoder, captures short-term dependency with decoder, and simplifies the attention interactions under the premise of highlighting dominant attention through a series of attention masking mechanisms. These improvements make the multi-head attention mechanism process the input sequence according to the time relationship, contributing to better interpretability. What's more, the context processing ability of the Encoder-Decoder architecture allows adopting the calendar of days to be forecasted to enhance the forecasting performance. Experiments conducted on the Jiuzhaigou valley and Siguniang mountain tourism demand datasets with other nine baseline methods indicate that the proposed Tsformer outperformed all baseline models in the short-term and long-term tourism demand forecasting tasks. Moreover, ablation studies demonstrate that the adoption of the calendar of days to be forecasted contributes to the forecasting performance of the proposed Tsformer. For better interpretability, the attention weight matrix visualization is performed. It indicates that the Tsformer concentrates on seasonal features and days close to days to be forecast in short-term forecasting.

GAN|对抗|攻击|生成相关(6篇)

【1】 A Differentiable Language Model Adversarial Attack on Text Classifiers 标题：一种可区分语言模型对文本分类器的对抗性攻击

作者：Ivan Fursov,Alexey Zaytsev,Pavel Burnyshev,Ekaterina Dmitrieva,Nikita Klyuchnikov,Andrey Kravchenko,Ekaterina Artemova,Evgeny Burnaev 机构： Skolkovo Institute of Science and Technology, Huawei Noah’s Ark lab, HSE University, Department of Computer Science, Oxford University 备注：arXiv admin note: substantial text overlap with arXiv:2006.11078 链接：https://arxiv.org/abs/2107.11275 摘要：基于大型Transformer的自然语言处理模型的健壮性是一个重要的问题，因为它们的能力和广泛采用。理解和提高这些模型健壮性的一种方法是探索对抗性攻击场景：检查输入的小扰动是否可以欺骗模型。由于文本数据的离散性，计算机视觉中广泛使用的基于梯度的对抗方法本身并不适用。克服这个问题的标准策略是开发标记级转换，它不考虑整个句子。本文提出了一种新的黑盒语句级攻击。我们的方法微调一个预先训练的语言模型来生成对抗性的例子。提出的可微损失函数依赖于一个替代分类器得分和一个通过深度学习模型计算的近似编辑距离。我们证明，在计算的度量和人的评估方面，所提出的攻击在不同的NLP问题上都优于竞争对手。此外，由于使用了微调的语言模型，生成的对抗性例子很难被检测到，因此现有的模型并不健壮。因此，很难抵御提议的攻击，而其他攻击则不然。摘要：Robustness of huge Transformer-based models for natural language processing is an important issue due to their capabilities and wide adoption. One way to understand and improve robustness of these models is an exploration of an adversarial attack scenario: check if a small perturbation of an input can fool a model. Due to the discrete nature of textual data, gradient-based adversarial methods, widely used in computer vision, are not applicable per~se. The standard strategy to overcome this issue is to develop token-level transformations, which do not take the whole sentence into account. In this paper, we propose a new black-box sentence-level attack. Our method fine-tunes a pre-trained language model to generate adversarial examples. A proposed differentiable loss function depends on a substitute classifier score and an approximate edit distance computed via a deep learning model. We show that the proposed attack outperforms competitors on a diverse set of NLP problems for both computed metrics and human evaluation. Moreover, due to the usage of the fine-tuned language model, the generated adversarial examples are hard to detect, thus current models are not robust. Hence, it is difficult to defend from the proposed attack, which is not the case for other attacks.

【2】 Effective and Interpretable fMRI Analysis via Functional Brain Network Generation 标题：基于功能性脑网络生成的有效和可解释的fMRI分析

作者：Xuan Kan,Hejie Cui,Ying Guo,Carl Yang 机构： 1Department of Computer Science, Emory University 2Departmentof Biostatistics and Bioinformatics 备注：This paper has been accepted for ICML 2021 Workshop for Interpretable Machine Learning in Healthcare 链接：https://arxiv.org/abs/2107.11247 摘要：神经科学的最新研究表明，功能性脑网络在功能性磁共振成像数据建模和临床预测方面具有巨大的潜力。然而，现有的功能性脑网络是有噪声的，不知道下游的预测任务，同时也与最近强大的机器学习模型GNNs不兼容。在这项工作中，我们开发了一个端到端的可训练管道，在下游预测任务的指导下，提取显著的fMRI特征，生成大脑网络，并使用GNNs进行预测。对PNC-fMRI数据的初步实验表明，该框架具有优越的有效性和独特的可解释性。摘要：Recent studies in neuroscience show great potential of functional brain networks constructed from fMRI data for popularity modeling and clinical predictions. However, existing functional brain networks are noisy and unaware of downstream prediction tasks, while also incompatible with recent powerful machine learning models of GNNs. In this work, we develop an end-to-end trainable pipeline to extract prominent fMRI features, generate brain networks, and make predictions with GNNs, all under the guidance of downstream prediction tasks. Preliminary experiments on the PNC fMRI data show the superior effectiveness and unique interpretability of our framework.

【3】 LARGE: Latent-Based Regression through GAN Semantics 标题：大型：通过GAN语义实现基于潜在的回归

作者：Yotam Nitzan,Rinon Gal,Ofir Brenner,Daniel Cohen-Or 机构：Tel-Aviv University, yotamnitzan.github.ioLARGE 备注：Code at this https URL 链接：https://arxiv.org/abs/2107.11186 摘要：我们提出了一种新的方法来解决回归任务使用Few-Shot或弱监督。我们的方法的核心是一个基本的观察，即GANs在其潜在空间中编码语义信息非常成功，即使在完全无监督的环境中也是如此。对于现代生成框架来说，这种语义编码表现为平滑的线性方向，以一种不纠缠的方式影响图像属性。这些方向在GAN基图像编辑中得到了广泛的应用。我们证明了这样的方向不仅是线性的，而且在各个属性上引起的变化幅度与沿着它们行驶的距离近似线性。通过利用这一观察，我们的方法将预先训练好的GAN转化为回归模型，只使用两个标记样本。这样就可以解决数据集和属性上的回归任务，而这些数据集和属性很难产生质量监督。此外，我们还表明，即使在没有明确监督的情况下，也可以使用相同的潜伏期来根据给定属性的强度对图像集合进行排序。大量的实验评估表明，我们的方法可以应用于广泛的领域，利用多个潜在方向发现框架，并在很少的镜头和低监督设置下实现最先进的结果，即使与设计用于处理单个任务的方法相比也是如此。摘要：We propose a novel method for solving regression tasks using few-shot or weak supervision. At the core of our method is the fundamental observation that GANs are incredibly successful at encoding semantic information within their latent space, even in a completely unsupervised setting. For modern generative frameworks, this semantic encoding manifests as smooth, linear directions which affect image attributes in a disentangled manner. These directions have been widely used in GAN-based image editing. We show that such directions are not only linear, but that the magnitude of change induced on the respective attribute is approximately linear with respect to the distance traveled along them. By leveraging this observation, our method turns a pre-trained GAN into a regression model, using as few as two labeled samples. This enables solving regression tasks on datasets and attributes which are difficult to produce quality supervision for. Additionally, we show that the same latent-distances can be used to sort collections of images by the strength of given attributes, even in the absence of explicit supervision. Extensive experimental evaluations demonstrate that our method can be applied across a wide range of domains, leverage multiple latent direction discovery frameworks, and achieve state-of-the-art results in few-shot and low-supervision settings, even when compared to methods designed to tackle a single task.

【4】 Generative adversarial networks in time series: A survey and taxonomy 标题：时间序列中的生成性对抗网络：综述与分类学

作者：Eoin Brophy,Zhengwei Wang,Qi She,Tomas Ward 机构：Infant Research Centre & School of Computing, Dublin City University, Ireland, ByteDance AI Lab, China, Tomás Ward, Insight SFI Research Centre for Data Analytics 链接：https://arxiv.org/abs/2107.11098 摘要：在过去的几年中，生成性对抗网络的研究呈指数增长。它们的影响主要体现在计算机视觉领域，真实感图像和视频处理，特别是生成，取得了重大进展。虽然这些计算机视觉的进步已经引起了人们的广泛关注，但GAN的应用已经在时间序列和序列生成等学科中多样化。作为GANs的一个相对较新的利基，实地调查正在进行中，以开发高质量、多样化和私有的时间序列数据。在本文中，我们回顾了GAN变量设计的时间序列相关的应用。我们提出了离散变量和连续变量的分类，其中变量处理离散时间序列和连续时间序列数据。在这里我们展示这个领域最新最流行的文学作品；它们的体系结构、结果和应用程序。我们还提供了最流行的评估指标及其在应用程序中的适用性列表。此外，还讨论了这些机构的隐私措施以及处理敏感数据的进一步保护和方向。我们的目标是明确和简洁的框架在这一领域的最新和最先进的研究及其应用到现实世界的技术。摘要：Generative adversarial networks (GANs) studies have grown exponentially in the past few years. Their impact has been seen mainly in the computer vision field with realistic image and video manipulation, especially generation, making significant advancements. While these computer vision advances have garnered much attention, GAN applications have diversified across disciplines such as time series and sequence generation. As a relatively new niche for GANs, fieldwork is ongoing to develop high quality, diverse and private time series data. In this paper, we review GAN variants designed for time series related applications. We propose a taxonomy of discrete-variant GANs and continuous-variant GANs, in which GANs deal with discrete time series and continuous time series data. Here we showcase the latest and most popular literature in this field; their architectures, results, and applications. We also provide a list of the most popular evaluation metrics and their suitability across applications. Also presented is a discussion of privacy measures for these GANs and further protections and directions for dealing with sensitive data. We aim to frame clearly and concisely the latest and state-of-the-art research in this area and their applications to real-world technologies.

【5】 Improving the Generalization of Meta-learning on Unseen Domains via Adversarial Shift 标题：通过对抗性转换提高未见域上元学习的泛化能力

作者：Pinzhuo Tian,Yao Gao 机构：Nanjing University, Yang Gao 备注：v1 链接：https://arxiv.org/abs/2107.11056 摘要：元学习为学习者提供了一种有效的学习方式，并在许多应用中取得了巨大的成功。然而，大多数元学习文献关注于处理来自同一领域的任务，因此很难将其推广到其他未知领域的任务。在这项工作中，我们通过模拟其他未知领域的任务来解决这个问题，以提高元学习方法的泛化和鲁棒性。具体来说，我们提出了一个模型无关的移位层来学习如何模拟域移位和生成伪任务，并开发了一种新的对抗式学习机制来训练域移位。基于伪任务的元学习模型可以学习跨领域的元知识，对未知领域具有很好的泛化能力。我们在领域泛化设置下进行了广泛的实验。实验结果表明，所提出的移位层适用于各种元学习框架。此外，我们的方法在不同的跨域Few-Shot分类基准上取得了最新的性能，在跨域Few-Shot回归上也取得了很好的效果。摘要：Meta-learning provides a promising way for learning to efficiently learn and achieves great success in many applications. However, most meta-learning literature focuses on dealing with tasks from a same domain, making it brittle to generalize to tasks from the other unseen domains. In this work, we address this problem by simulating tasks from the other unseen domains to improve the generalization and robustness of meta-learning method. Specifically, we propose a model-agnostic shift layer to learn how to simulate the domain shift and generate pseudo tasks, and develop a new adversarial learning-to-learn mechanism to train it. Based on the pseudo tasks, the meta-learning model can learn cross-domain meta-knowledge, which can generalize well on unseen domains. We conduct extensive experiments under the domain generalization setting. Experimental results demonstrate that the proposed shift layer is applicable to various meta-learning frameworks. Moreover, our method also leads to state-of-the-art performance on different cross-domain few-shot classification benchmarks and produces good results on cross-domain few-shot regression.

【6】 AD-GAN: End-to-end Unsupervised Nuclei Segmentation with Aligned Disentangling Training 标题：AD-GAN：基于对齐解缠训练的端到端无监督细胞核分割

作者：Kai Yao,Kaizhu Huang,Jie Sun,Curran Jude 机构：Xi’an Jiaotong-Liverpool University, Suzhou, Jiangsu Province, China, University of Liverpool 链接：https://arxiv.org/abs/2107.11022 摘要：本文考虑无监督的细胞核分割。利用最近提出的细胞核图像和随机合成掩模之间的非配对图像到图像的转换，现有的方法，例如CycleGAN，已经取得了令人鼓舞的结果。然而，这些方法通常采用两级流水线，无法实现对细胞核图像的端到端学习。更严重的是，它们可能导致有损变换问题，即原始图像和相应的分割输出之间的内容不一致。为了解决这些局限性，我们提出了一种新的端到端的无监督框架，称为对齐分离生成对抗网络（AD-GAN）。独特的是，AD-GAN引入了表示分离，将内容表示（底层空间结构）与风格表示（结构的呈现）分离开来。在这个框架下，空间结构可以被显式地保留，使得宏观层次的有损变换显著减少。我们还提出了一种新的训练算法，能够在潜在空间中对齐分离的内容，以减少微观层次的有损变换。对真实世界的二维和三维数据集的评估表明，AD-GAN在数量和质量上都明显优于其他比较方法和专业软件。具体而言，在四个细胞核数据集上，提出的AD-GAN比目前最好的无监督方法平均提高了17.8%（w.r.t.度量骰子）。作为一种无监督方法，AD-GAN甚至可以与最好的监督模型进行竞争，进一步向端到端的无监督核分割迈进。摘要：We consider unsupervised cell nuclei segmentation in this paper. Exploiting the recently-proposed unpaired image-to-image translation between cell nuclei images and randomly synthetic masks, existing approaches, e.g., CycleGAN, have achieved encouraging results. However, these methods usually take a two-stage pipeline and fail to learn end-to-end in cell nuclei images. More seriously, they could lead to the lossy transformation problem, i.e., the content inconsistency between the original images and the corresponding segmentation output. To address these limitations, we propose a novel end-to-end unsupervised framework called Aligned Disentangling Generative Adversarial Network (AD-GAN). Distinctively, AD-GAN introduces representation disentanglement to separate content representation (the underling spatial structure) from style representation (the rendering of the structure). With this framework, spatial structure can be preserved explicitly, enabling a significant reduction of macro-level lossy transformation. We also propose a novel training algorithm able to align the disentangled content in the latent space to reduce micro-level lossy transformation. Evaluations on real-world 2D and 3D datasets show that AD-GAN substantially outperforms the other comparison methods and the professional software both quantitatively and qualitatively. Specifically, the proposed AD-GAN leads to significant improvement over the current best unsupervised methods by an average 17.8% relatively (w.r.t. the metric DICE) on four cell nuclei datasets. As an unsupervised method, AD-GAN even performs competitive with the best supervised models, taking a further leap towards end-to-end unsupervised nuclei segmentation.

半/弱/无/有监督|不确定性|主动学习(3篇)

【1】 MCDAL: Maximum Classifier Discrepancy for Active Learning 标题：MCDAL：主动学习的最大分类器差异

作者：Jae Won Cho,Dong-Jin Kim,Yunjae Jung,In So Kweon 机构：KAIST, South Korea. 备注：10 pages 链接：https://arxiv.org/abs/2107.11049 摘要：目前最先进的主动学习方法大多利用生成性对抗网络（GAN）进行样本获取；然而，GAN通常具有不稳定性和对超参数的敏感性。与这些方法相比，本文提出了一种新的主动学习框架，称为最大分类器差异主动学习（MCDAL），它考虑了多个分类器之间的预测差异。特别地，我们利用两个辅助分类层，通过最大化它们之间的差异来学习更紧密的决策边界。直观地说，辅助分类层预测结果的差异反映了预测结果的不确定性。在这方面，我们提出了一种新的方法来利用分类器的差异来实现主动学习。我们还提供了一个解释，我们的想法与现有的基于GAN的主动学习方法和领域适应框架。此外，我们还通过实验证明了我们的方法的实用性，在主动学习环境下，我们的方法在一些图像分类和语义分割数据集上的性能超过了最先进的方法。摘要：Recent state-of-the-art active learning methods have mostly leveraged Generative Adversarial Networks (GAN) for sample acquisition; however, GAN is usually known to suffer from instability and sensitivity to hyper-parameters. In contrast to these methods, we propose in this paper a novel active learning framework that we call Maximum Classifier Discrepancy for Active Learning (MCDAL) which takes the prediction discrepancies between multiple classifiers. In particular, we utilize two auxiliary classification layers that learn tighter decision boundaries by maximizing the discrepancies among them. Intuitively, the discrepancies in the auxiliary classification layers' predictions indicate the uncertainty in the prediction. In this regard, we propose a novel method to leverage the classifier discrepancies for the acquisition function for active learning. We also provide an interpretation of our idea in relation to existing GAN based active learning methods and domain adaptation frameworks. Moreover, we empirically demonstrate the utility of our approach where the performance of our approach exceeds the state-of-the-art methods on several image classification and semantic segmentation datasets in active learning setups.

【2】 Estimating Predictive Uncertainty Under Program Data Distribution Shift 标题：程序数据分布漂移下的预测不确定性估计

作者：Yufei Li,Simin Chen,Wei Yang 机构：University of Texas at Dallas, Dallas, USA 备注：12 pages, 3 figures 链接：https://arxiv.org/abs/2107.10989 摘要：深度学习（Deep learning，DL）技术在各种任务的预测准确率方面取得了巨大的成功，但深度神经网络（Deep neural networks，DNNs）在异常样本的预测准确率上也表现出很高的过度自信。定义良好的不确定性表明模型的输出是否应该（或不应该）被信任，因此在实际场景中变得至关重要，因为现实场景通常涉及到由于许多因素而改变的输入分布。现有的不确定性方法假设，测试来自不同数据分布的样本将导致不可靠的模型预测，因此具有更高的不确定性得分。他们通过校准DL模型对给定输入的置信度来量化模型的不确定性，并评估在计算机视觉（CV）和自然语言处理（NLP）相关任务中的有效性。然而，由于数据表示和转换模式的差异，在编程任务下，它们的方法论的可靠性可能会受到影响。在本文中，我们首先定义了程序数据中三种不同类型的分布移位，并构建了一个大规模的移位Java数据集。我们在我们的数据集上实现了两个通用编程语言任务来研究每个分布偏移对DL模型性能的影响。我们还提出了一个大规模的基准，现有的国家最先进的预测不确定性的编程任务，并探讨其有效性在数据分布转移。实验表明，程序分布偏移确实不同程度地降低了DL模型的性能，现有的不确定性方法在量化程序数据集的不确定性方面都存在一定的局限性。摘要：Deep learning (DL) techniques have achieved great success in predictive accuracy in a variety of tasks, but deep neural networks (DNNs) are shown to produce highly overconfident scores for even abnormal samples. Well-defined uncertainty indicates whether a model's output should (or should not) be trusted and thus becomes critical in real-world scenarios which typically involves shifted input distributions due to many factors. Existing uncertainty approaches assume that testing samples from a different data distribution would induce unreliable model predictions thus have higher uncertainty scores. They quantify model uncertainty by calibrating DL model's confidence of a given input and evaluate the effectiveness in computer vision (CV) and natural language processing (NLP)-related tasks. However, their methodologies' reliability may be compromised under programming tasks due to difference in data representations and shift patterns. In this paper, we first define three different types of distribution shift in program data and build a large-scale shifted Java dataset. We implement two common programming language tasks on our dataset to study the effect of each distribution shift on DL model performance. We also propose a large-scale benchmark of existing state-of-the-art predictive uncertainty on programming tasks and investigate their effectiveness under data distribution shift. Experiments show that program distribution shift does degrade the DL model performance to varying degrees and that existing uncertainty methods all present certain limitations in quantifying uncertainty on program dataset.

【3】 Bagging, optimized dynamic mode decomposition (BOP-DMD) for robust, stable forecasting with spatial and temporal uncertainty-quantification 标题：打包优化动态模式分解(BOP-DMD)，用于稳健、稳定的预测，具有空间和时间不确定性-量化

作者：Diya Sashidhar,J. Nathan Kutz 机构：Department of Applied Mathematics, University of Washington, Seattle, WA ,- 备注：12 pages, 8 figures, 2 algorithms 链接：https://arxiv.org/abs/2107.10878 摘要：动态模式分解（DMD）提供了一个回归框架，用于在时间或时空数据快照上自适应地学习最佳拟合线性动力学模型。已经发展了多种回归技术来产生线性模型近似，其解在时间上是指数的。对于时空数据，DMD以主导模态结构的形式提供低秩和可解释的模型，以及它们在时间上的指数/振荡行为。然而，大多数DMD算法容易因动态噪声测量产生偏差，导致模型拟合差和预测能力不稳定。优化的DMD算法通过变量投影优化使模型偏差最小化，从而使预测能力趋于稳定。在这里，优化的DMD算法是通过使用统计bagging方法来改进的，其中使用单个快照集来生成优化的DMD模型的集合。对这些模型的输出进行平均，得到一个袋装优化动态模式分解（BOP-DMD）。BOP-DMD不仅提高了性能，而且增强了模型的鲁棒性，同时提供了空间和时间不确定性量化（UQ）。因此，与目前可用的DMD算法不同，BOP-DMD为具有综合UQ度量的概率或贝叶斯预测提供了一个稳定而健壮的模型。摘要：Dynamic mode decomposition (DMD) provides a regression framework for adaptively learning a best-fit linear dynamics model over snapshots of temporal, or spatio-temporal, data. A diversity of regression techniques have been developed for producing the linear model approximation whose solutions are exponentials in time. For spatio-temporal data, DMD provides low-rank and interpretable models in the form of dominant modal structures along with their exponential/oscillatory behavior in time. The majority of DMD algorithms, however, are prone to bias errors from noisy measurements of the dynamics, leading to poor model fits and unstable forecasting capabilities. The optimized DMD algorithm minimizes the model bias with a variable projection optimization, thus leading to stabilized forecasting capabilities. Here, the optimized DMD algorithm is improved by using statistical bagging methods whereby a single set of snapshots is used to produce an ensemble of optimized DMD models. The outputs of these models are averaged to produce a bagging, optimized dynamic mode decomposition (BOP-DMD). BOP-DMD not only improves performance, it also robustifies the model and provides both spatial and temporal uncertainty quantification (UQ). Thus unlike currently available DMD algorithms, BOP-DMD provides a stable and robust model for probabilistic, or Bayesian forecasting with comprehensive UQ metrics.

迁移|Zero/Few/One-Shot|自适应(4篇)

【1】 Robust Adaptive Submodular Maximization 标题：鲁棒自适应子模最大化

作者：Shaojie Tang 机构：Naveen Jindal School of Management, The University of Texas at Dallas 链接：https://arxiv.org/abs/2107.11333 摘要：现有的自适应子模优化研究大多集中在一般情况下，即目标是在已知的实现分布上找到一个使期望效用最大化的策略。然而，在最坏情况实现下，具有良好平均情况性能的策略可能具有非常差的性能。在这项研究中，我们提出研究两种变型的自适应子模优化问题，即最坏情况下的自适应子模最大化和鲁棒子模最大化。第一个问题的目标是找到一个使最坏情况效用最大化的策略，而后一个问题的目标是找到一个同时获得接近最优平均情况效用和最坏情况效用的策略（如果有的话）。我们引入了一类新的随机函数，称为最坏情况子模函数。对于受$p$-系统约束的最坏情况自适应子模最大化问题，我们提出了一种自适应最坏情况贪婪策略，当效用函数是最坏情况子模时，该策略对最优最坏情况效用具有$\frac{1}{p+1}$近似比。对于具有基数约束的鲁棒自适应子模最大化问题，如果效用函数既是最坏情况下的子模又是自适应子模，我们提出了一种混合自适应策略，在最坏情况和平均情况下同时达到接近$1-e^{-\frac{1}{2}}$的近似值。我们也描述了我们的理论结果的几个应用，包括池基主动学习，随机子模集覆盖和自适应病毒营销。摘要：Most of existing studies on adaptive submodular optimization focus on the average-case, i.e., their objective is to find a policy that maximizes the expected utility over a known distribution of realizations. However, a policy that has a good average-case performance may have very poor performance under the worst-case realization. In this study, we propose to study two variants of adaptive submodular optimization problems, namely, worst-case adaptive submodular maximization and robust submodular maximization. The first problem aims to find a policy that maximizes the worst-case utility and the latter one aims to find a policy, if any, that achieves both near optimal average-case utility and worst-case utility simultaneously. We introduce a new class of stochastic functions, called \emph{worst-case submodular function}. For the worst-case adaptive submodular maximization problem subject to a $p$-system constraint, we develop an adaptive worst-case greedy policy that achieves a $\frac{1}{p+1}$ approximation ratio against the optimal worst-case utility if the utility function is worst-case submodular. For the robust adaptive submodular maximization problem subject to a cardinality constraint, if the utility function is both worst-case submodular and adaptive submodular, we develop a hybrid adaptive policy that achieves an approximation close to $1-e^{-\frac{1}{2}}$ under both worst case setting and average case setting simultaneously. We also describe several applications of our theoretical results, including pool-base active learning, stochastic submodular set cover and adaptive viral marketing.

【2】 An Adaptive State Aggregation Algorithm for Markov Decision Processes 标题：马尔可夫决策过程的一种自适应状态聚集算法

作者：Guanting Chen,Johann Demetrio Gaebler,Matt Peng,Chunlin Sun,Yinyu Ye 机构：† Institute for Computational and Mathematical Engineering, Stanford University, ‡ Department of Electrical Engineering & Computer Sciences, University of California, Berkeley, § Department of Management Science and Engineering, Stanford University 链接：https://arxiv.org/abs/2107.11053 摘要：值迭代法是求解马尔可夫决策过程（MDPs）的一种著名方法，它实现简单，具有很强的理论收敛性。然而，随着状态空间的增大，值迭代的计算量很快变得不可行。对于大状态空间和动作空间mdp中的值迭代，已经提出了各种方法来克服这个问题，然而，通常以通用性和算法简单性为代价。在本文中，我们提出了一个求解mdp的直观算法，该算法通过动态地将具有类似go值代价的状态分组在一起来降低值迭代更新的代价。我们还证明了我们的算法几乎可以肯定地收敛到\（\ell^\infty\）范数中的真正最优值的\（2\varepsilon/（1-\gamma）\，其中\（\gamma\）是折扣因子，聚合状态最多相差\（\varepsilon\）。在各种模拟环境下的数值实验证实了该算法的鲁棒性，特别是随着MDP问题规模的增大，该算法能够以更廉价的更新来求解MDP问题。摘要：Value iteration is a well-known method of solving Markov Decision Processes (MDPs) that is simple to implement and boasts strong theoretical convergence guarantees. However, the computational cost of value iteration quickly becomes infeasible as the size of the state space increases. Various methods have been proposed to overcome this issue for value iteration in large state and action space MDPs, often at the price, however, of generalizability and algorithmic simplicity. In this paper, we propose an intuitive algorithm for solving MDPs that reduces the cost of value iteration updates by dynamically grouping together states with similar cost-to-go values. We also prove that our algorithm converges almost surely to within $2\varepsilon / (1 - \gamma)$ of the true optimal value in the $\ell^\infty$ norm, where $\gamma$ is the discount factor and aggregated states differ by at most $\varepsilon$. Numerical experiments on a variety of simulated environments confirm the robustness of our algorithm and its ability to solve MDPs with much cheaper updates especially as the scale of the MDP problem increases.

【3】 VisDA-2021 Competition Universal Domain Adaptation to Improve Performance on Out-of-Distribution Data 标题：VisDA-2021竞争通用域适配以提高分布外数据的性能

作者：Dina Bashkirova,Dan Hendrycks,Donghyun Kim,Samarth Mishra,Kate Saenko,Kuniaki Saito,Piotr Teterwak,Ben Usman 机构：Boston University, MIT-IBM Watson AI, UC Berkeley 备注：Neurips 2021 Competition Track 链接：https://arxiv.org/abs/2107.11011 摘要：机器学习的进展通常是通过在同一数据分布（即同一领域）上训练和测试模型来衡量的。这高估了分布外数据的未来准确性。视觉域适应（VisDA）2021竞争测试模型适应新的测试分布和处理分布变化的能力。我们为图像分类器设置了无监督域自适应挑战，并将评估对新视点、背景、模式和质量退化的自适应。我们的挑战是利用大规模的公共可用数据集，但构建跨域的评估，而不是传统的域内基准。此外，我们将重点放在困难的“通用”设置上，除了输入分布漂移外，方法在目标数据集中还可能遇到缺失和/或新类。性能将使用一个严格的协议来衡量，与最先进的领域适应方法相比较，并借助于已建立的指标。我们相信，这场比赛将鼓励机器学习方法在许多部署场景中处理真实数据的能力进一步提高。摘要：Progress in machine learning is typically measured by training and testing a model on the same distribution of data, i.e., the same domain. This over-estimates future accuracy on out-of-distribution data. The Visual Domain Adaptation (VisDA) 2021 competition tests models' ability to adapt to novel test distributions and handle distributional shift. We set up unsupervised domain adaptation challenges for image classifiers and will evaluate adaptation to novel viewpoints, backgrounds, modalities and degradation in quality. Our challenge draws on large-scale publicly available datasets but constructs the evaluation across domains, rather that the traditional in-domain bench-marking. Furthermore, we focus on the difficult "universal" setting where, in addition to input distribution drift, methods may encounter missing and/or novel classes in the target dataset. Performance will be measured using a rigorous protocol, comparing to state-of-the-art domain adaptation methods with the help of established metrics. We believe that the competition will encourage further improvement in machine learning methods' ability to handle realistic data in many deployment scenarios.

【4】 Compositional Models: Multi-Task Learning and Knowledge Transfer with Modular Networks 标题：组合模型：基于模块化网络的多任务学习与知识转移

作者：Andrey Zhmoginov,Dina Bashkirova,Mark Sandler 机构：Google LLC, Amphitheatre Parkway, Mountain View, CA , USA, Boston University, Boston, MA , USA 链接：https://arxiv.org/abs/2107.10963 摘要：条件计算和模块化网络最近被提出用于多任务学习和其他问题，作为一种将问题求解分解为多个可重用计算块的方法。提出了一种新的模块化网络学习方法，该方法基于ResNet的等距模型，所有剩余块具有相同的结构和相同的参数数目。这种体系结构选择允许添加、删除和更改剩余块的顺序。在我们的方法中，这些模块可以被重复调用，并且通过调整计算顺序，允许知识转移到新的任务中。这允许任务间的软权重共享，只需少量增加参数数量。结果表明，在多任务学习、迁移学习和领域适应的情况下，该方法可以实现模块的可解释自组织，同时在这些任务上获得竞争性结果。从实际的角度来看，我们的方法允许：（a）通过调整计算顺序重用现有模块来学习新任务；（b）将其用于无监督的多源域自适应，说明只需操纵预训练模块的顺序就可以实现对不可见数据的自适应，（c）展示我们的方法如何通过多次重用同一块，在不增加任何参数的情况下，提高图像分类任务（如ImageNet）的现有体系结构的精度。摘要：Conditional computation and modular networks have been recently proposed for multitask learning and other problems as a way to decompose problem solving into multiple reusable computational blocks. We propose a new approach for learning modular networks based on the isometric version of ResNet with all residual blocks having the same configuration and the same number of parameters. This architectural choice allows adding, removing and changing the order of residual blocks. In our method, the modules can be invoked repeatedly and allow knowledge transfer to novel tasks by adjusting the order of computation. This allows soft weight sharing between tasks with only a small increase in the number of parameters. We show that our method leads to interpretable self-organization of modules in case of multi-task learning, transfer learning and domain adaptation while achieving competitive results on those tasks. From practical perspective, our approach allows to: (a) reuse existing modules for learning new task by adjusting the computation order, (b) use it for unsupervised multi-source domain adaptation to illustrate that adaptation to unseen data can be achieved by only manipulating the order of pretrained modules, (c) show how our approach can be used to increase accuracy of existing architectures for image classification tasks such as ImageNet, without any parameter increase, by reusing the same block multiple times.

强化学习(2篇)

【1】 Model Selection for Offline Reinforcement Learning: Practical Considerations for Healthcare Settings 标题：离线强化学习的模型选择：医疗设置的实际考虑

作者：Shengpu Tang,Jenna Wiens 机构：Department of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, USA 备注：33 pages, 9 figures. Machine Learning for Healthcare Conference (MLHC 2021) 链接：https://arxiv.org/abs/2107.11003 摘要：强化学习（RL）可以用来学习治疗策略和辅助医疗决策。然而，由于需要在复杂的状态/动作空间上进行泛化，合并函数逼近器（例如，深度神经网络）需要进行模型选择，以减少过度拟合并提高部署时的策略性能。然而，用于模型选择的标准验证管道需要在实际环境中运行学习的策略，这在医疗环境中通常是不可行的。在这项工作中，我们研究了离线RL的模型选择管道，它依赖于off-policy evaluation（OPE）作为验证性能的代理。我们对流行的OPE方法进行了深入分析，强调了在对一组候选策略进行排序时额外的超参数和计算要求（辅助模型的拟合/推理）。我们比较了在学习治疗脓毒症患者的情况下，作为模型选择管道一部分的不同手术方法的效用。在我们考虑的所有OPE方法中，拟合Q评价（FQE）始终导致最佳验证排名，但计算成本很高。为了平衡排序精度和计算效率之间的权衡，我们提出了一种简单的两阶段方法，通过避免可能不必要的计算来加速模型选择。我们的工作可作为离线RL模型选择的实用指南，并可帮助RL实践者使用真实数据集选择策略。为了便于再现性和将来的扩展，本文附带的代码可在线获取https://github.com/MLD3/OfflineRL_ModelSelection. 摘要：Reinforcement learning (RL) can be used to learn treatment policies and aid decision making in healthcare. However, given the need for generalization over complex state/action spaces, the incorporation of function approximators (e.g., deep neural networks) requires model selection to reduce overfitting and improve policy performance at deployment. Yet a standard validation pipeline for model selection requires running a learned policy in the actual environment, which is often infeasible in a healthcare setting. In this work, we investigate a model selection pipeline for offline RL that relies on off-policy evaluation (OPE) as a proxy for validation performance. We present an in-depth analysis of popular OPE methods, highlighting the additional hyperparameters and computational requirements (fitting/inference of auxiliary models) when used to rank a set of candidate policies. We compare the utility of different OPE methods as part of the model selection pipeline in the context of learning to treat patients with sepsis. Among all the OPE methods we considered, fitted Q evaluation (FQE) consistently leads to the best validation ranking, but at a high computational cost. To balance this trade-off between accuracy of ranking and computational efficiency, we propose a simple two-stage approach to accelerate model selection by avoiding potentially unnecessary computation. Our work serves as a practical guide for offline RL model selection and can help RL practitioners select policies using real-world datasets. To facilitate reproducibility and future extensions, the code accompanying this paper is available online at https://github.com/MLD3/OfflineRL_ModelSelection.

【2】 A reinforcement learning approach to resource allocation in genomic selection 标题：基因组选择中资源分配的强化学习方法

作者：Saba Moeinizade,Guiping Hu,Lizhi Wang 机构：Industrial and Manufacturing Systems Engineering Department, Iowa State University 备注：18 pages,5 figures 链接：https://arxiv.org/abs/2107.10901 摘要：基因组选择（GS）是植物育种家用来选择个体进行交配并产生新一代物种的技术。资源配置是全球战略的关键因素。在每一个选择周期中，育种家都面临着预算分配的选择，以进行杂交并产生下一代的育种亲本。受人工智能问题强化学习最新进展的启发，我们开发了一种基于强化学习的算法来自动学习在不同世代的育种中分配有限的资源。我们在马尔可夫决策过程（MDP）的框架下，通过定义状态空间和动作空间，对问题进行了数学描述。为了避免状态空间的爆炸，提出了一个整数线性规划来量化资源和时间之间的权衡。最后，我们提出一个价值函数近似方法来估计行动价值函数，然后发展贪婪策略改进技术来寻找最佳资源。通过一个实际数据的例子，我们证明了该方法在提高遗传增益方面的有效性。摘要：Genomic selection (GS) is a technique that plant breeders use to select individuals to mate and produce new generations of species. Allocation of resources is a key factor in GS. At each selection cycle, breeders are facing the choice of budget allocation to make crosses and produce the next generation of breeding parents. Inspired by recent advances in reinforcement learning for AI problems, we develop a reinforcement learning-based algorithm to automatically learn to allocate limited resources across different generations of breeding. We mathematically formulate the problem in the framework of Markov Decision Process (MDP) by defining state and action spaces. To avoid the explosion of the state space, an integer linear program is proposed that quantifies the trade-off between resources and time. Finally, we propose a value function approximation method to estimate the action-value function and then develop a greedy policy improvement technique to find the optimal resources. We demonstrate the effectiveness of the proposed method in enhancing genetic gain using a case study with realistic data.

元学习(1篇)

【1】 A novel meta-learning initialization method for physics-informed neural networks 标题：一种新的物理信息神经网络元学习初始化方法

作者：Xu Liu,Xiaoya Zhang,Wei Peng,Weien Zhou,Wen Yao 机构：Received: date Accepted: date 链接：https://arxiv.org/abs/2107.10991 摘要：物理信息神经网络（PINNs）已被广泛应用于解决各种科学计算问题。然而，庞大的训练成本限制了一些实时应用程序的pinn。虽然已经提出了一些提高PNN训练效率的工作，但很少考虑初始化的影响。为此，我们提出了一种新的基于爬虫类初始化的物理信息神经网络（nrpin）。爬虫算法是一种基于标记数据的元学习初始化方法。通过在损失函数中加入偏微分方程（pde）作为惩罚项，可以用较少的标记数据甚至没有任何标记数据来训练pinn。受此启发，我们提出了一种新的爬虫类初始化方法，从参数化偏微分方程中提取更多的样本任务，并调整损失的惩罚项。新的爬行动物初始化算法通过有监督、无监督和半监督学习从相关任务中获取初始化参数。然后，带初始化参数的pinn可以有效地求解偏微分方程。此外，新的爬行动物初始化也可以用于PINNs的变体。最后，我们证明并验证了nrpin同时考虑了正问题，包括求解Poisson，Burgers和Schr-odinger方程，以及逆问题，其中pde中的未知参数是估计的。实验结果表明，与其他初始化方法相比，NRPINN训练速度更快，精度更高。摘要：Physics-informed neural networks (PINNs) have been widely used to solve various scientific computing problems. However, large training costs limit PINNs for some real-time applications. Although some works have been proposed to improve the training efficiency of PINNs, few consider the influence of initialization. To this end, we propose a New Reptile initialization based Physics-Informed Neural Network (NRPINN). The original Reptile algorithm is a meta-learning initialization method based on labeled data. PINNs can be trained with less labeled data or even without any labeled data by adding partial differential equations (PDEs) as a penalty term into the loss function. Inspired by this idea, we propose the new Reptile initialization to sample more tasks from the parameterized PDEs and adapt the penalty term of the loss. The new Reptile initialization can acquire initialization parameters from related tasks by supervised, unsupervised, and semi-supervised learning. Then, PINNs with initialization parameters can efficiently solve PDEs. Besides, the new Reptile initialization can also be used for the variants of PINNs. Finally, we demonstrate and verify the NRPINN considering both forward problems, including solving Poisson, Burgers, and Schr\"odinger equations, as well as inverse problems, where unknown parameters in the PDEs are estimated. Experimental results show that the NRPINN training is much faster and achieves higher accuracy than PINNs with other initialization methods.

推荐(2篇)

【1】 Adaptively Weighted Top-N Recommendation for Organ Matching 标题：用于器官匹配的自适应加权Top-N推荐

作者：Parshin Shojaee,Xiaoyu Chen,Ran Jin 机构：Grado Department of Industrial and Systems Engineering, Virginia Tech, Blacksburg, VA , USA 链接：https://arxiv.org/abs/2107.10971 摘要：减少器官捐献的短缺以满足候诊病人的需求是器官移植的一大挑战。器官匹配决策是将有限的存活器官分配给最合适的患者的最关键的决策。目前，器官配型决策只能由第一性原理建立的配型模型计算出的配型分数来决定。然而，这些模型可能与移植后的实际匹配表现（例如，患者的移植后生活质量（QoL）或移植失败测量）不一致。本文将器官匹配决策问题描述为top-N推荐问题，提出了一种自适应加权top-N推荐（AWTR）方法。AWTR通过使用历史数据集中有限的实际匹配性能以及从器官捐赠者和患者收集的协变量来改进当前评分模型的性能。AWTR通过强调前N名匹配患者的推荐和排名准确性来牺牲整体推荐准确性。提出的方法在一个模拟研究中得到了验证，其中KAS[60]被用来模拟器官患者的推荐反应。结果表明，该方法的性能优于7种最新的top-N推荐基准方法。摘要：Reducing the shortage of organ donations to meet the demands of patients on the waiting list has being a major challenge in organ transplantation. Because of the shortage, organ matching decision is the most critical decision to assign the limited viable organs to the most suitable patients. Currently, organ matching decisions were only made by matching scores calculated via scoring models, which are built by the first principles. However, these models may disagree with the actual post-transplantation matching performance (e.g., patient's post-transplant quality of life (QoL) or graft failure measurements). In this paper, we formulate the organ matching decision-making as a top-N recommendation problem and propose an Adaptively Weighted Top-N Recommendation (AWTR) method. AWTR improves performance of the current scoring models by using limited actual matching performance in historical data set as well as the collected covariates from organ donors and patients. AWTR sacrifices the overall recommendation accuracy by emphasizing the recommendation and ranking accuracy for top-N matched patients. The proposed method is validated in a simulation study, where KAS [60] is used to simulate the organ-patient recommendation response. The results show that our proposed method outperforms seven state-of-the-art top-N recommendation benchmark methods.

【2】 What are you optimizing for? Aligning Recommender Systems with Human Values 标题：您在针对什么进行优化？使推荐系统与人类价值观保持一致

作者：Jonathan Stray,Ivan Vendrov,Jeremy Nixon,Steven Adler,Dylan Hadfield-Menell 机构： 3Department of Electrical Engineering and Computer Science 备注：Originally presented at the ICML 2020 Participatory Approaches to Machine Learning workshop 链接：https://arxiv.org/abs/2107.10939 摘要：我们描述的案例中，真正的推荐系统被修改服务于各种人类价值观，如多样性，公平性，幸福感，时间以及事实的准确性。由此，我们确定了价值工程的当前实践：使用基于价值的标签从人工创建的数据创建分类器。这在实践中适用于各种问题，但问题一次只解决一个，用户和其他利益相关者很少参与。相反，我们期待人工智能调整工作的方法，可以直接从利益相关者那里学习复杂的价值观，并确定四个主要方向：有用的调整措施、参与式设计和操作、交互式价值学习和知情的审议判断。摘要：We describe cases where real recommender systems were modified in the service of various human values such as diversity, fairness, well-being, time well spent, and factual accuracy. From this we identify the current practice of values engineering: the creation of classifiers from human-created data with value-based labels. This has worked in practice for a variety of issues, but problems are addressed one at a time, and users and other stakeholders have seldom been involved. Instead, we look to AI alignment work for approaches that could learn complex values directly from stakeholders, and identify four major directions: useful measures of alignment, participatory design and operation, interactive value learning, and informed deliberative judgments.

联邦学习|隐私保护|加密(2篇)

【1】 Communication Efficiency in Federated Learning: Achievements and Challenges 标题：联合学习中的沟通效率：成就与挑战

作者：Osama Shahid,Seyedamin Pouriyeh,Reza M. Parizi,Quan Z. Sheng,Gautam Srivastava,Liang Zhao 机构：∗ Department of Information Technology, Kennesaw State University, Marietta, GA, USA, † Department of Software Engineering and Game Development, Kennesaw State University, Marietta, GA, USA, ‡Department of Computing, Macquarie University, Sydney, Australia 链接：https://arxiv.org/abs/2107.10996 摘要：联邦学习（FL）以分布式方式执行机器学习任务。多年来，这已成为一项新兴技术，特别是随着各种数据保护和隐私政策的实施，它允许执行机器学习任务，同时应对这些挑战。任何新技术的出现，都会带来挑战和好处。FL中存在的一个挑战是通信成本，因为FL发生在分布式环境中，通过网络连接的设备必须不断共享其更新，这会造成通信瓶颈。在这篇论文中，我们提出了一个调查研究，以克服沟通的限制，在外语设置。摘要：Federated Learning (FL) is known to perform Machine Learning tasks in a distributed manner. Over the years, this has become an emerging technology especially with various data protection and privacy policies being imposed FL allows performing machine learning tasks whilst adhering to these challenges. As with the emerging of any new technology, there are going to be challenges and benefits. A challenge that exists in FL is the communication costs, as FL takes place in a distributed environment where devices connected over the network have to constantly share their updates this can create a communication bottleneck. In this paper, we present a survey of the research that is performed to overcome the communication constraints in an FL setting.

【2】 Federated Learning Versus Classical Machine Learning: A Convergence Comparison 标题：联合学习与经典机器学习的收敛性比较

作者：Muhammad Asad,Ahmed Moustafa,Takayuki Ito 机构：Department of Computer Science, Nagoya Institute of Technology,-, Nagoya - Japan 链接：https://arxiv.org/abs/2107.10976 摘要：在过去的几十年里，机器学习已经彻底改变了数据处理的大规模应用。同时，越来越多的隐私威胁导致了经典数据训练模型的重新设计。特别是，经典的机器学习涉及到集中的数据训练，在这里收集数据，整个训练过程在中央服务器上执行。尽管有显著的收敛性，但本次训练涉及参与者与中央云服务器共享数据时的若干隐私威胁。为此，联邦学习在分布式数据训练中具有重要的地位。特别是，联合学习允许参与者在本地数据上协作训练本地模型，而无需向中央云服务器透露其敏感信息。本文在logistic回归MNIST数据集和image-classification-CIFAR-10数据集上对经典机器学习和联邦学习的收敛性进行了比较。仿真结果表明，联邦学习在保持参与者匿名性的前提下，在有限的通信回合内实现了更高的收敛性。我们希望这项研究能显示联合学习的益处，并有助于联合学习的广泛实施。摘要：In the past few decades, machine learning has revolutionized data processing for large scale applications. Simultaneously, increasing privacy threats in trending applications led to the redesign of classical data training models. In particular, classical machine learning involves centralized data training, where the data is gathered, and the entire training process executes at the central server. Despite significant convergence, this training involves several privacy threats on participants' data when shared with the central cloud server. To this end, federated learning has achieved significant importance over distributed data training. In particular, the federated learning allows participants to collaboratively train the local models on local data without revealing their sensitive information to the central cloud server. In this paper, we perform a convergence comparison between classical machine learning and federated learning on two publicly available datasets, namely, logistic-regression-MNIST dataset and image-classification-CIFAR-10 dataset. The simulation results demonstrate that federated learning achieves higher convergence within limited communication rounds while maintaining participants' anonymity. We hope that this research will show the benefits and help federated learning to be implemented widely.

推理|分析|理解|解释(3篇)

【1】 VisMCA: A Visual Analytics System for Misclassification Correction and Analysis. VAST Challenge 2020, Mini-Challenge 2 Award: Honorable Mention for Detailed Analysis of Patterns of Misclassification 标题：VisMCA：一个用于错误分类校正和分析的可视化分析系统。2020年巨大挑战，迷你挑战2奖：错误分类模式详细分析荣誉奖

作者：Huyen N. Nguyen,Jake Gonzalez,Jian Guo,Ngan V. T. Nguyen,Tommy Dang 机构： we utilize visualization and visual• The authors are with the Department of Computer Science, Texas TechUniversity 备注：None 链接：https://arxiv.org/abs/2107.11181 摘要：本文介绍了VisMCA，一个交互式可视化分析系统，它支持加深对ML结果的理解，增强用户纠正错误分类的能力，并提供对底层模式的分析，为了响应浩瀚的挑战2020迷你挑战2。VisMCA有助于跟踪出处，并提供目标检测结果的全面视图，简化重新标记，并生成可靠、正确的数据以供将来训练。我们的解决方案实现了可视化分析的多个分析视图，为底层模式发现提供了深入的见解。摘要：This paper presents VisMCA, an interactive visual analytics system that supports deepening understanding in ML results, augmenting users' capabilities in correcting misclassification, and providing an analysis of underlying patterns, in response to the VAST Challenge 2020 Mini-Challenge 2. VisMCA facilitates tracking provenance and provides a comprehensive view of object detection results, easing re-labeling, and producing reliable, corrected data for future training. Our solution implements multiple analytical views on visual analysis to offer a deep insight for underlying pattern discovery.

【2】 Domain Generalization under Conditional and Label Shifts via Variational Bayesian Inference 标题：基于变分贝叶斯推理的条件移位和标签移位下的区域泛化

作者：Xiaofeng Liu,Bo Hu,Linghao Jin,Xu Han,Fangxu Xing,Jinsong Ouyang,Jun Lu,Georges EL Fakhri,Jonghye Woo 机构：Dept. of Radiology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA, Beth Israel Deaconess Medical Center and Harvard Medical School, Boston, MA, USA, National University of Singapore, Singapore 备注：30th International Joint Conference on Artificial Intelligence (IJCAI) 2021 链接：https://arxiv.org/abs/2107.10931 摘要：在这项工作中，我们提出了一种领域泛化（DG）方法来学习多个标记的源领域，并将知识转移到训练中无法访问的目标领域。考虑到固有的条件转移和标签转移，我们希望对齐$p（x | y）$和$p（y）$。然而，广泛使用的领域不变特征学习（IFL）方法依赖于对齐边缘概念移位w.r.t.$p（x）$，这是基于一个不切实际的假设，$p（y）$是跨领域不变的。在此基础上，我们提出了一种新的变分贝叶斯推理框架，通过潜在空间中的先验分布匹配来实现条件分布对齐w.r.t.$p（x | y）$，同时考虑了后验对齐时的边缘标签移位w.r.t.$p（y）$。在各种基准上的大量实验表明，我们的框架对标签移动具有鲁棒性，跨域精度得到显著提高，从而实现了优于传统IFL框架的性能。摘要：In this work, we propose a domain generalization (DG) approach to learn on several labeled source domains and transfer knowledge to a target domain that is inaccessible in training. Considering the inherent conditional and label shifts, we would expect the alignment of $p(x|y)$ and $p(y)$. However, the widely used domain invariant feature learning (IFL) methods relies on aligning the marginal concept shift w.r.t. $p(x)$, which rests on an unrealistic assumption that $p(y)$ is invariant across domains. We thereby propose a novel variational Bayesian inference framework to enforce the conditional distribution alignment w.r.t. $p(x|y)$ via the prior distribution matching in a latent space, which also takes the marginal label shift w.r.t. $p(y)$ into consideration with the posterior alignment. Extensive experiments on various benchmarks demonstrate that our framework is robust to the label shift and the cross-domain accuracy is significantly improved, thereby achieving superior performance over the conventional IFL counterparts.

【3】 Optimum Risk Portfolio and Eigen Portfolio: A Comparative Analysis Using Selected Stocks from the Indian Stock Market 标题：最优风险投资组合与特征投资组合：基于印度股市精选股票的比较分析

作者：Jaydip Sen,Sidra Mehtab 机构：Praxis Business School, Kolkata, India 备注：The is the preprint of our accepted paper in the journal International Journal of Business Forecasting and Marketing Intelligence published by Inderscience Publishers, Switzerland. It consists of 35 pages, and includes 29 figures and 36 tables 链接：https://arxiv.org/abs/2107.11371 摘要：设计一个最优的投资组合，将权重分配给它的成份股，从而在收益和风险之间实现最佳的权衡，是一个具有挑战性的研究问题。Markowitz提出的经典均值-方差投资组合理论在实际股市数据上表现为次优，因为预期收益的估计误差会对投资组合的表现产生不利影响。本文介绍了印度股市七个重要板块的三种投资组合设计方法，即最小风险投资组合、最优风险投资组合和特征投资组合。从2016年1月1日至2020年12月31日，从雅虎财经网站上截取股票的每日历史价格。为本研究选择的七个行业中的每一个行业建立了三个投资组合，并根据训练数据分析了投资组合，这些投资组合基于若干指标，如年化收益率和风险，分配给组成股票的权重、相关热图和特征投资组合的主成分。最后，对所有行业的最优风险投资组合和特征投资组合进行了为期六个月的收益测试。比较了投资组合的表现，确定了各行业回报率较高的投资组合。摘要：Designing an optimum portfolio that allocates weights to its constituent stocks in a way that achieves the best trade-off between the return and the risk is a challenging research problem. The classical mean-variance theory of portfolio proposed by Markowitz is found to perform sub-optimally on the real-world stock market data since the error in estimation for the expected returns adversely affects the performance of the portfolio. This paper presents three approaches to portfolio design, viz, the minimum risk portfolio, the optimum risk portfolio, and the Eigen portfolio, for seven important sectors of the Indian stock market. The daily historical prices of the stocks are scraped from Yahoo Finance website from January 1, 2016, to December 31, 2020. Three portfolios are built for each of the seven sectors chosen for this study, and the portfolios are analyzed on the training data based on several metrics such as annualized return and risk, weights assigned to the constituent stocks, the correlation heatmaps, and the principal components of the Eigen portfolios. Finally, the optimum risk portfolios and the Eigen portfolios for all sectors are tested on their return over a period of a six-month period. The performances of the portfolios are compared and the portfolio yielding the higher return for each sector is identified.

检测相关(2篇)

【1】 Dynamic detection of mobile malware using smartphone data and machine learning 标题：利用智能手机数据和机器学习动态检测移动恶意软件

作者：J. S. Panman de Wit,J. van der Ham,D. Bucur 备注：14 pages content, 22 pages total, to be published in ACM DTRAP (currently in last revision phase) 链接：https://arxiv.org/abs/2107.11167 摘要：移动恶意软件是针对移动设备的恶意程序。这是一个日益严重的问题，每年检测到的移动恶意软件样本数量都在上升。智能手机活跃用户的数量预计将增长，这强调了研究检测移动恶意软件的重要性。移动恶意软件的检测方法是存在的，但仍然是有限的。在本文中，我们提供了一个机器学习（ML）技术的性能概述，以检测Android上的恶意软件，而不使用特权访问。ML分类器使用诸如CPU使用率、电池使用率和内存使用率等设备信息来检测Android操作系统（OS）上的10个子类型的移动特洛伊木马。我们使用了一个真实的数据集，其中包含一年（2016年）47个用户的设备和恶意软件数据。我们将检查设备的哪些功能（即方面）对监视以检测（子类型的）移动特洛伊木马最重要。本文的重点是动态硬件特性。利用这些动态特征，我们应用了最先进的机器学习分类器：随机森林、K近邻和AdaBoost。我们展示了不同功能集的分类结果，区分了全局设备功能和特定应用程序功能。所有测量的功能集都不需要特权访问。我们的结果表明，随机森林分类器作为一个通用的恶意软件分类器表现最好：在10个子类型的移动木马中，它的F1得分为0.73，假阳性率（FPR）为0.009，假阴性率（FNR）为0.380。随机森林、K近邻和AdaBoost分类器在分别训练以检测每种移动特洛伊木马子类型时，F1得分高于0.72，FPR低于0.02，FNR低于0.33。摘要：Mobile malware are malicious programs that target mobile devices. They are an increasing problem, as seen in the rise of detected mobile malware samples per year. The number of active smartphone users is expected to grow, stressing the importance of research on the detection of mobile malware. Detection methods for mobile malware exist but are still limited. In this paper, we provide an overview of the performance of machine learning (ML) techniques to detect malware on Android, without using privileged access. The ML-classifiers use device information such as the CPU usage, battery usage, and memory usage for the detection of 10 subtypes of Mobile Trojans on the Android Operating System (OS). We use a real-life dataset containing device and malware data from 47 users for a year (2016). We examine which features, i.e. aspects, of a device, are most important to monitor to detect (subtypes of) Mobile Trojans. The focus of this paper is on dynamic hardware features. Using these dynamic features we apply state-of-the-art machine learning classifiers: Random Forest, K-Nearest Neighbour, and AdaBoost. We show classification results on different feature sets, making a distinction between global device features, and specific app features. None of the measured feature sets require privileged access. Our results show that the Random Forest classifier performs best as a general malware classifier: across 10 subtypes of Mobile Trojans, it achieves an F1 score of 0.73 with a False Positive Rate (FPR) of 0.009 and a False Negative Rate (FNR) of 0.380. The Random Forest, K-Nearest Neighbours, and AdaBoost classifiers achieve F1 scores above 0.72, an FPR below 0.02 and, an FNR below 0.33, when trained separately to detect each subtype of Mobile Trojans.

【2】 HURRA! Human readable router anomaly detection 标题：哈拉！人类可读的路由器异常检测

作者：Jose M. Navarro,Dario Rossi 备注：None 链接：https://arxiv.org/abs/2107.11078 摘要：本文介绍了HURRA系统，该系统旨在减少操作人员在网络故障排除过程中所花费的时间。为此，它包括两个模块，在任何异常检测算法之后插入：（i）第一个注意机制，根据当前特征与异常的关系对其进行排序；（ii）第二个模块能够无缝地结合先前的专家知识，而不需要任何人的交互或决策。我们在一组真实的路由器数据集上展示了这些简单过程的有效性，这些数据集来自数十个isp，表现出丰富的异常和非常异构的kpi集，在这些数据集上，我们通过解决疑难解答问题的操作员手动收集注释的地面真相。我们的实验评估表明：（i）所提出的系统在与专家达成高度一致方面是有效的，（ii）即使是简单的统计方法也能够从过去案例中获得的专家知识中提取有用的信息，以进一步提高性能，最后（iii）实时部署的主要困难涉及异常检测算法的自动选择及其超参数的调整。摘要：This paper presents HURRA, a system that aims to reduce the time spent by human operators in the process of network troubleshooting. To do so, it comprises two modules that are plugged after any anomaly detection algorithm: (i) a first attention mechanism, that ranks the present features in terms of their relation with the anomaly and (ii) a second module able to incorporates previous expert knowledge seamlessly, without any need of human interaction nor decisions. We show the efficacy of these simple processes on a collection of real router datasets obtained from tens of ISPs which exhibit a rich variety of anomalies and very heterogeneous set of KPIs, on which we gather manually annotated ground truth by the operator solving the troubleshooting ticket. Our experimental evaluation shows that (i) the proposed system is effective in achieving high levels of agreement with the expert, that (ii) even a simple statistical approach is able to extracting useful information from expert knowledge gained in past cases to further improve performance and finally that (iii) the main difficulty in live deployment concerns the automated selection of the anomaly detection algorithm and the tuning of its hyper-parameters.

分类|识别(1篇)

【1】 RGB Image Classification with Quantum Convolutional Ansaetze 标题：基于量子卷积分析的RGB图像分类

作者：Yu Jing,Yang Yang,Chonghang Wu,Wenbing Fu,Wei Hu,Xiaogang Li,Hua Xu 机构：Kunfeng Quantum 链接：https://arxiv.org/abs/2107.11099 摘要：随着量子硬件技术中量子比特数和相干时间的快速增长，在噪声中尺度量子器件上实现浅层神经网络引起了人们的广泛兴趣。提出了许多用于灰度图像分类的量子（卷积）电路，取得了很好的实验结果。然而，在RGB图像上应用这些信息时，对视觉任务有用的通道内信息并没有得到有效的提取。本文提出两种量子电路模拟RGB图像的卷积运算，它们在通道间和通道内信息的提取方式上有所不同。据我们所知，这是量子卷积电路第一次有效地处理RGB图像，与纯经典CNNs相比具有更高的测试精度。我们还研究了量子电路ansatz的大小与混合量子经典卷积神经网络的可学习性之间的关系。通过基于CIFAR-10和MNIST数据集的实验，我们证明了较大尺寸的量子电路ansatz在多类分类任务中提高了预测性能，为近期量子算法的发展提供了有益的启示。摘要：With the rapid growth of qubit numbers and coherence times in quantum hardware technology, implementing shallow neural networks on the so-called Noisy Intermediate-Scale Quantum (NISQ) devices has attracted a lot of interest. Many quantum (convolutional) circuit ansaetze are proposed for grayscale images classification tasks with promising empirical results. However, when applying these ansaetze on RGB images, the intra-channel information that is useful for vision tasks is not extracted effectively. In this paper, we propose two types of quantum circuit ansaetze to simulate convolution operations on RGB images, which differ in the way how inter-channel and intra-channel information are extracted. To the best of our knowledge, this is the first work of a quantum convolutional circuit to deal with RGB images effectively, with a higher test accuracy compared to the purely classical CNNs. We also investigate the relationship between the size of quantum circuit ansatz and the learnability of the hybrid quantum-classical convolutional neural network. Through experiments based on CIFAR-10 and MNIST datasets, we demonstrate that a larger size of the quantum circuit ansatz improves predictive performance in multiclass classification tasks, providing useful insights for near term quantum algorithm developments.

编码器(1篇)

【1】 Heteroscedastic Temporal Variational Autoencoder For Irregularly Sampled Time Series 标题：用于不规则采样时间序列的异方差时间变分自动编码器

作者：Satya Narayan Shukla,Benjamin M. Marlin 机构：University of Massachusetts Amherst, Amherst, MA , USA 链接：https://arxiv.org/abs/2107.11350 摘要：不规则采样时间序列通常出现在多个领域，它们对标准的深度学习模型提出了重大挑战。本文提出了一种新的不规则采样时间序列概率插值的深度学习框架，称之为异方差时间变分自编码器（HeTVAE）。HeTVAE包括一个新的输入层来编码关于输入观测稀疏性的信息，一个时间VAE架构来传播由于输入稀疏性而产生的不确定性，以及一个异方差输出层来实现输出插值中的可变不确定性。我们的研究结果表明，与一系列的基线模型和传统模型，以及最近提出的使用等腰输出层的深度潜变量模型相比，所提出的结构能够更好地反映由于稀疏和不规则采样引起的随时间变化的不确定性。摘要：Irregularly sampled time series commonly occur in several domains where they present a significant challenge to standard deep learning models. In this paper, we propose a new deep learning framework for probabilistic interpolation of irregularly sampled time series that we call the Heteroscedastic Temporal Variational Autoencoder (HeTVAE). HeTVAE includes a novel input layer to encode information about input observation sparsity, a temporal VAE architecture to propagate uncertainty due to input sparsity, and a heteroscedastic output layer to enable variable uncertainty in output interpolations. Our results show that the proposed architecture is better able to reflect variable uncertainty through time due to sparse and irregular sampling than a range of baseline and traditional models, as well as recently proposed deep latent variable models that use homoscedastic output layers.

优化|敛散性(4篇)

【1】 High Dimensional Differentially Private Stochastic Optimization with Heavy-tailed Data 标题：具有重尾数据的高维微分私有随机优化

作者：Lijie Hu,Shuo Ni,Hanshen Xiao,Di Wang 机构：King Abdullah University of Science and Technology, Saudi Arabia, University of Southern California, United States, Massachusetts Institute of Technology 链接：https://arxiv.org/abs/2107.11136 摘要：差分私有随机凸优化（DP-SCO）作为机器学习、统计和微分隐私等领域中最基本的问题之一，近年来得到了广泛的研究。然而，以往的工作大多只能处理规则数据分布或低维空间中的不规则数据。为了更好地理解不规则数据分布所带来的挑战，本文首次研究了高维空间中具有重尾数据的DP-SCO问题。在第一部分中，我们主要讨论一些多面体约束（如$\ellu 1$-范数球）上的问题。我们证明了在$\epsilon$-DP模型中，如果损失函数是光滑的且其梯度有界二阶矩，则可能得到$\tilde{O}（\frac{\log d}{（n\epsilon）^\frac{1}{3}）$的（高概率）误差界（超额总体风险），其中，$n$是样本大小，$d$是底层空间的维数。接下来，对于LASSO，如果数据分布具有有界的四阶矩，我们改进了$（\epsilon，\delta）$-DP模型中$\tilde{O}（\frac{\log d}{（n\epsilon）^\frac{2}{5}}）$的界。第二部分研究了重尾数据下的稀疏学习。我们首先回顾稀疏线性模型，并提出一个截断的DP-IHT方法，其输出可以达到$\tilde{O}（\frac{s^{*2}\logd}{n\epsilon}）$的误差，其中，$s^*$是底层参数的稀疏性。然后我们研究了稀疏性（{em即}$\ell0$-范数）约束上的一个更一般的问题，并且证明了如果损失函数是光滑的和强凸的，则在$\tilde{O}（\frac{s^{*\frac{3}{2}\logd}{n\epsilon}）$的因子下也可能达到$\tilde{O}（\sqrt{s^*}）}的近似最优。摘要：As one of the most fundamental problems in machine learning, statistics and differential privacy, Differentially Private Stochastic Convex Optimization (DP-SCO) has been extensively studied in recent years. However, most of the previous work can only handle either regular data distribution or irregular data in the low dimensional space case. To better understand the challenges arising from irregular data distribution, in this paper we provide the first study on the problem of DP-SCO with heavy-tailed data in the high dimensional space. In the first part we focus on the problem over some polytope constraint (such as the $\ell_1$-norm ball). We show that if the loss function is smooth and its gradient has bounded second order moment, it is possible to get a (high probability) error bound (excess population risk) of $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{1}{3}})$ in the $\epsilon$-DP model, where $n$ is the sample size and $d$ is the dimensionality of the underlying space. Next, for LASSO, if the data distribution that has bounded fourth-order moments, we improve the bound to $\tilde{O}(\frac{\log d}{(n\epsilon)^\frac{2}{5}})$ in the $(\epsilon, \delta)$-DP model. In the second part of the paper, we study sparse learning with heavy-tailed data. We first revisit the sparse linear model and propose a truncated DP-IHT method whose output could achieve an error of $\tilde{O}(\frac{s^{*2}\log d}{n\epsilon})$, where $s^*$ is the sparsity of the underlying parameter. Then we study a more general problem over the sparsity ({\em i.e.,} $\ell_0$-norm) constraint, and show that it is possible to achieve an error of $\tilde{O}(\frac{s^{*\frac{3}{2}}\log d}{n\epsilon})$, which is also near optimal up to a factor of $\tilde{O}{(\sqrt{s^*})}$, if the loss function is smooth and strongly convex.

【2】 Implicit Rate-Constrained Optimization of Non-decomposable Objectives 标题：不可分解目标的隐式速率约束优化

作者：Abhishek Kumar,Harikrishna Narasimhan,Andrew Cotter 备注：ICML 2021 链接：https://arxiv.org/abs/2107.10960 摘要：我们考虑一个受欢迎的家庭中的约束优化问题产生的机器学习，涉及优化不可分解的评价指标与一定阈值形式，同时约束另一个度量的兴趣。这些问题的例子包括以固定的假阳性率优化假阴性率，以固定的召回率优化精度，优化精度召回或ROC曲线下的区域，我们的核心思想是通过隐函数定理建立一个速率约束优化模型，将阈值参数表示为模型参数的函数。我们展示了如何使用基于标准梯度的方法来解决由此产生的优化问题。在基准数据集上的实验证明了本文方法的有效性。摘要：We consider a popular family of constrained optimization problems arising in machine learning that involve optimizing a non-decomposable evaluation metric with a certain thresholded form, while constraining another metric of interest. Examples of such problems include optimizing the false negative rate at a fixed false positive rate, optimizing precision at a fixed recall, optimizing the area under the precision-recall or ROC curves, etc. Our key idea is to formulate a rate-constrained optimization that expresses the threshold parameter as a function of the model parameters via the Implicit Function theorem. We show how the resulting optimization problem can be solved using standard gradient based methods. Experiments on benchmark datasets demonstrate the effectiveness of our proposed method over existing state-of-the art approaches for these problems.

【3】 DeepTitle -- Leveraging BERT to generate Search Engine Optimized Headlines 标题：DeepTitle--利用BERT生成搜索引擎优化的标题

作者：Cristian Anastasiu,Hanna Behnke,Sarah Lück,Viktor Malesevic,Aamna Najmi,Javier Poveda-Panter 机构：Amazon Web Services, Seefeldstrasse , Zürich, Switzerland, SPRING Axel Springer Digital, News Media GmbH & Co. KG, Axel-Springer-Straße , Berlin, Germany, Sarah Lueck∗, Oskar-von-Miller-Ring , München, Germany 备注：9 pages, 4 figures 链接：https://arxiv.org/abs/2107.10935 摘要：为在线新闻文章自动生成标题并不是一项简单的任务——机器生成的标题需要语法正确、信息丰富、吸引注意力并生成搜索流量，而不是“点击诱饵”或“假新闻”。在本文中，我们展示了如何利用预先训练的语言模型来创建一个抽象的德语新闻标题生成器。我们将最先进的微调技术用于抽象文本摘要，即我们对编码器和解码器使用不同的优化器，其中前者是预先训练的，后者是从头开始训练的。我们修改了标题生成，以包含搜索引擎优化相关的常用关键字。我们在一个德国新闻数据集上进行了实验，获得了40.02分的胭脂-L-克F-分数。此外，我们还通过引入句子相似度度量和人类评价，解决了ROUGE在文本摘要质量评价中的局限性。摘要：Automated headline generation for online news articles is not a trivial task - machine generated titles need to be grammatically correct, informative, capture attention and generate search traffic without being "click baits" or "fake news". In this paper we showcase how a pre-trained language model can be leveraged to create an abstractive news headline generator for German language. We incorporate state of the art fine-tuning techniques for abstractive text summarization, i.e. we use different optimizers for the encoder and decoder where the former is pre-trained and the latter is trained from scratch. We modify the headline generation to incorporate frequently sought keywords relevant for search engine optimization. We conduct experiments on a German news data set and achieve a ROUGE-L-gram F-score of 40.02. Furthermore, we address the limitations of ROUGE for measuring the quality of text summarization by introducing a sentence similarity metric and human evaluation.

【4】 Finite-Bit Quantization For Distributed Algorithms With Linear Convergence 标题：线性收敛分布式算法的有限位量化

作者：Chang-Shen Lee,Nicolò Michelusi,Gesualdo Scutari 机构： Michelusi is with the School of Electrical, Arizona State University 备注：Submitted to the IEEE Transactions on Information Theory 链接：https://arxiv.org/abs/2107.11304 摘要：研究了在量化通信条件下网状网络上（强凸）组合优化问题的分布式算法。我们提出了一种以定点迭代的形式铸造分布式算法的黑盒模型，以线性率收敛，而不是专注于特定的算法设计。算法模型与量化器设计上的一个新的（随机）有偏压缩（BC-）规则相结合，保持了线性收敛性。文中还提出了一种新的量化器和一种通信高效编码方案，该方案利用有限的比特数有效地实现了BC规则。这与大多数现有的量化规则形成了对比，它们的实现需要无限多的比特。对黑盒模型进行了统一的通信复杂度分析，确定了在所需精度范围内达到优化问题解所需的平均比特数。数值结果验证了我们的理论结果，并表明采用该量化器的分布式算法比采用现有量化规则的算法具有更好的通信复杂度。摘要：This paper studies distributed algorithms for (strongly convex) composite optimization problems over mesh networks, subject to quantized communications. Instead of focusing on a specific algorithmic design, we propose a black-box model casting distributed algorithms in the form of fixed-point iterates, converging at linear rate. The algorithmic model is coupled with a novel (random) Biased Compression (BC-)rule on the quantizer design, which preserves linear convergence. A new quantizer coupled with a communication-efficient encoding scheme is also proposed, which efficiently implements the BC-rule using a finite number of bits. This contrasts with most of existing quantization rules, whose implementation calls for an infinite number of bits. A unified communication complexity analysis is developed for the black-box model, determining the average number of bit required to reach a solution of the optimization problem within the required accuracy. Numerical results validate our theoretical findings and show that distributed algorithms equipped with the proposed quantizer have more favorable communication complexity than algorithms using existing quantization rules.

预测|估计(6篇)

【1】 Human Pose Regression with Residual Log-likelihood Estimation 标题：基于残差对数似然估计的人体姿态回归

作者：Jiefeng Li,Siyuan Bian,Ailing Zeng,Can Wang,Bo Pang,Wentao Liu,Cewu Lu 机构：Shanghai Jiao Tong University, The Chinese University of Hong Kong, SenseTime Research 备注：ICCV 2021 Oral 链接：https://arxiv.org/abs/2107.11291 摘要：在人体姿态估计领域，基于热图的方法主要是通过似然热图来模拟输出分布。相比之下，基于回归的方法效率更高，但性能较差。在这项工作中，我们探讨最大似然估计（MLE）来发展一个有效的回归方法。从极大似然估计的角度看，采用不同的回归损失对输出密度函数作不同的假设。密度函数越接近真实分布，回归性能越好。有鉴于此，我们提出了一个新的回归范式与剩余对数似然估计（RLE）捕捉潜在的产出分布。具体来说，RLE学习分布的变化，而不是未参考的潜在分布，以便于训练过程。通过重新参数化设计，我们的方法与现有的流动模型是兼容的。该方法有效、高效、灵活。通过综合实验证明了该方法在各种人体姿态估计任务中的应用潜力。与传统的回归范式相比，RLE回归在没有任何测试时间开销的情况下，MSCOCO的mAP提高了12.4map。此外，我们的回归方法首次优于基于热图的方法，特别是在多人姿态估计方面。我们的代码在https://github.com/Jeff-sjtu/res-loglikelihood-regression 摘要：Heatmap-based methods dominate in the field of human pose estimation by modelling the output distribution through likelihood heatmaps. In contrast, regression-based methods are more efficient but suffer from inferior performance. In this work, we explore maximum likelihood estimation (MLE) to develop an efficient and effective regression-based methods. From the perspective of MLE, adopting different regression losses is making different assumptions about the output density function. A density function closer to the true distribution leads to a better regression performance. In light of this, we propose a novel regression paradigm with Residual Log-likelihood Estimation (RLE) to capture the underlying output distribution. Concretely, RLE learns the change of the distribution instead of the unreferenced underlying distribution to facilitate the training process. With the proposed reparameterization design, our method is compatible with off-the-shelf flow models. The proposed method is effective, efficient and flexible. We show its potential in various human pose estimation tasks with comprehensive experiments. Compared to the conventional regression paradigm, regression with RLE bring 12.4 mAP improvement on MSCOCO without any test-time overhead. Moreover, for the first time, especially on multi-person pose estimation, our regression method is superior to the heatmap-based methods. Our code is available at https://github.com/Jeff-sjtu/res-loglikelihood-regression

【2】 Data-driven deep density estimation 标题：数据驱动的深层密度估计

作者：Patrik Puchert,Pedro Hermosilla,Tobias Ritschel,Timo Ropinski 机构：� The Author(s) 备注：35 pages, 25 figures. Puplished in Neural Computing and Applications (2021). The method described is available as python pip pachage deep_density_estimation and on github this https URL DDE 链接：https://arxiv.org/abs/2107.11085 摘要：密度估计在许多数据分析任务中起着至关重要的作用，因为它从离散样本中推断出连续的概率密度函数（PDF）。因此，它被用于各种各样的任务，如分析人口数据、二维传感器读数中的空间位置或从三维扫描重建场景。在本文中，我们引入了一种学习的，数据驱动的深度密度估计（DDE）来精确而有效地推断PDF，同时与域维数或样本大小无关。此外，在估计过程中，我们不需要访问原始PDF，既不是参数形式，也不是先验形式，也不是许多样本形式。这是通过在无限合成PDF流上训练非结构化卷积神经网络实现的，因为与任何自然有限训练数据相比，大量的合成训练数据在一组自然PDF上的泛化效果更好。因此，我们希望我们公开的DDE方法在数据分析的许多领域都是有益的，在这些领域中，连续模型是从离散的观测值来估计的。摘要：Density estimation plays a crucial role in many data analysis tasks, as it infers a continuous probability density function (PDF) from discrete samples. Thus, it is used in tasks as diverse as analyzing population data, spatial locations in 2D sensor readings, or reconstructing scenes from 3D scans. In this paper, we introduce a learned, data-driven deep density estimation (DDE) to infer PDFs in an accurate and efficient manner, while being independent of domain dimensionality or sample size. Furthermore, we do not require access to the original PDF during estimation, neither in parametric form, nor as priors, or in the form of many samples. This is enabled by training an unstructured convolutional neural network on an infinite stream of synthetic PDFs, as unbound amounts of synthetic training data generalize better across a deck of natural PDFs than any natural finite training data will do. Thus, we hope that our publicly available DDE method will be beneficial in many areas of data analysis, where continuous models are to be estimated from discrete observations.

【3】 Size doesn't matter: predicting physico- or biochemical properties based on dozens of molecules 标题：大小并不重要：根据几十个分子预测物理或生化特性

作者：Kirill Karpov,Artem Mitrofanov,Vadim Korolev,Valery Tkachenko 机构：Lomonosov Moscow State University, Department of Chemistry, Leninskie gory, bld. , Moscow , Russia, Science Data Software, LLC, Forest Landing Cir, Rockville, MD , USA 备注：9 pages, 6 figures 链接：https://arxiv.org/abs/2107.10882 摘要：在化学中使用机器学习已经成为一种普遍的做法。同时，尽管现代机器学习方法取得了成功，但是数据的缺乏限制了它们的使用。使用迁移学习方法可以帮助解决这个问题。这种方法假定，建立在足够数量数据基础上的模型能够捕捉到化合物结构的一般特征，而在缺乏数据的数据集上进一步重用这些特征将大大提高新模型的质量。在这篇论文中，我们发展了这种方法对有机小分子，实现转移学习与图形卷积神经网络。本文表明，在缺乏数据的情况下，目标特性模型的性能有了显著的改善。还考虑了数据集的组成对模型质量的影响以及模型的适用范围。摘要：The use of machine learning in chemistry has become a common practice. At the same time, despite the success of modern machine learning methods, the lack of data limits their use. Using a transfer learning methodology can help solve this problem. This methodology assumes that a model built on a sufficient amount of data captures general features of the chemical compound structure on which it was trained and that the further reuse of these features on a dataset with a lack of data will greatly improve the quality of the new model. In this paper, we develop this approach for small organic molecules, implementing transfer learning with graph convolutional neural networks. The paper shows a significant improvement in the performance of models for target properties with a lack of data. The effects of the dataset composition on model quality and the applicability domain of the resulting models are also considered.

【4】 State, global and local parameter estimation using local ensemble Kalman filters: applications to online machine learning of chaotic dynamics 标题：基于局部集成卡尔曼滤波的状态、全局和局部参数估计：在混沌动力学在线机器学习中的应用

作者：Quentin Malartic,Alban Farchi,Marc Bocquet 机构：CEREA, ´Ecole des Ponts and EDF R&D, ˆIle–de–France, France, LMDIPSL, ENS, PSL Universit´e, ´Ecole Polytechnique, Institut Polytechnique de Paris, Sorbonne Universit´e, CNRS, Paris, France 链接：https://arxiv.org/abs/2107.11253 摘要：最近的研究表明，将机器学习方法与数据同化相结合，仅利用系统的稀疏和噪声观测就可以重构出一个动力系统。同样的方法也可以用来修正基于知识的模型的错误。由此产生的代理模型是混合的，统计部分补充了物理部分。在实践中，可以将校正作为一个集成项（即模型预解式中的{textit}）或直接添加到物理模型的趋势中。这种预解校正方法易于实现。趋势修正更具技术性，特别是需要物理模型的伴随，而且更具灵活性。利用双尺度Lorenz模型对两种方法进行了比较。采用预解修正和趋势修正的替代模型在长期预报实验中的精度有一定的相似性。相比之下，在资料同化实验中，使用趋势修正的替代模式显著优于使用预解修正的替代模式。最后，我们证明了趋势修正开启了在线模型误差修正的可能性，即当新的观测值可用时，逐步改进模型。该算法可以看作是弱约束4D-Var的一种新形式。我们将在线学习和离线学习与双尺度Lorenz系统进行了比较，结果表明，在线学习可以从稀疏和噪声观测中提取所有信息。摘要：Recent studies have shown that it is possible to combine machine learning methods with data assimilation to reconstruct a dynamical system using only sparse and noisy observations of that system. The same approach can be used to correct the error of a knowledge-based model. The resulting surrogate model is hybrid, with a statistical part supplementing a physical part. In practice, the correction can be added as an integrated term (\textit{i.e.} in the model resolvent) or directly inside the tendencies of the physical model. The resolvent correction is easy to implement. The tendency correction is more technical, in particular it requires the adjoint of the physical model, but also more flexible. We use the two-scale Lorenz model to compare the two methods. The accuracy in long-range forecast experiments is somewhat similar between the surrogate models using the resolvent correction and the tendency correction. By contrast, the surrogate models using the tendency correction significantly outperform the surrogate models using the resolvent correction in data assimilation experiments. Finally, we show that the tendency correction opens the possibility to make online model error correction, \textit{i.e.} improving the model progressively as new observations become available. The resulting algorithm can be seen as a new formulation of weak-constraint 4D-Var. We compare online and offline learning using the same framework with the two-scale Lorenz system, and show that with online learning, it is possible to extract all the information from sparse and noisy observations.

【5】 Economic Recession Prediction Using Deep Neural Network 标题：基于深度神经网络的经济衰退预测

作者：Zihao Wang,Kun Li,Steve Q. Xia,Hongfu Liu 机构：Michtom School of Computer Science, Brandeis University, Waltham, MA, United, Guardian Life, Hudson Yard, New York, NY, United States 链接：https://arxiv.org/abs/2107.10980 摘要：我们研究了不同的机器学习方法在预测经济周期中的有效性。我们确定带自动编码器的Bi LSTM的深度学习方法是预测美国经济衰退开始和结束的最精确模型。我们采用常用的宏观和市场条件特征来比较不同机器学习模型在样本和样本中产生良好预测的能力样本不足。当预测变量和模型系数随时间变化时，该模型具有灵活性和动态性。它为过去两次衰退提供了很好的样本外预测，并为COVID-19衰退提供了早期预警。摘要：We investigate the effectiveness of different machine learning methodologies in predicting economic cycles. We identify the deep learning methodology of Bi-LSTM with Autoencoder as the most accurate model to forecast the beginning and end of economic recessions in the U.S. We adopt commonly-available macro and market-condition features to compare the ability of different machine learning models to generate good predictions both in-sample and out-of-sample. The proposed model is flexible and dynamic when both predictive variables and model coefficients vary over time. It provided good out-of-sample predictions for the past two recessions and early warning about the COVID-19 recession.

【6】 Linear Polytree Structural Equation Models: Structural Learning and Inverse Correlation Estimation 标题：线性多叉树结构方程模型：结构学习和逆相关估计

作者：Xingmei Lou,Yu Hu,Xiaodong Li 机构：Department of Statistics, University of California, Davis, Department of Mathematics and Division of Life Science, Hong Kong University of, Science and Technology 备注：27 pages, 3 figures 链接：https://arxiv.org/abs/2107.10955 摘要：我们感兴趣的问题是学习有向无环图（DAG）时，数据产生的线性结构方程模型（SEM）和因果结构可以表征为一个多树。特别地，在高斯和次高斯模型下，我们研究了著名的Chow-Liu算法精确恢复由CPDAG唯一表示的polytree等价类的样本大小条件。我们还研究了在这种模型下估计逆相关矩阵的错误率。我们的理论结果被综合的数值模拟所说明，并且在基准数据上的实验也证明了当地面真值图形结构只能由一个多树来近似时，该方法的鲁棒性。摘要：We are interested in the problem of learning the directed acyclic graph (DAG) when data are generated from a linear structural equation model (SEM) and the causal structure can be characterized by a polytree. Specially, under both Gaussian and sub-Gaussian models, we study the sample size conditions for the well-known Chow-Liu algorithm to exactly recover the equivalence class of the polytree, which is uniquely represented by a CPDAG. We also study the error rate for the estimation of the inverse correlation matrix under such models. Our theoretical findings are illustrated by comprehensive numerical simulations, and experiments on benchmark data also demonstrate the robustness of the method when the ground truth graphical structure can only be approximated by a polytree.

其他神经网络|深度学习|模型|建模(17篇)

【1】 Rethinking Hard-Parameter Sharing in Multi-Task Learning 标题：对多任务学习中参数硬共享的再思考

作者：Lijun Zhang,Qizheng Yang,Xiao Liu,Hui Guan 机构：University of Massachusetts Amherst 备注：15 pages, 6 figures 链接：https://arxiv.org/abs/2107.11359 摘要：多任务学习（MTL）中的硬参数共享允许任务共享部分模型参数，降低存储成本，提高预测精度。常见的共享实践是在任务之间共享深层神经网络的底层，同时为每个任务使用单独的顶层。在这项工作中，我们通过对细粒度图像分类任务的实证研究重新审视了这种常见的做法，并进行了两个令人惊讶的观察(1）使用单独的底层参数可以获得比通常做法更好的性能，并且这种现象适用于在不同主干结构上联合训练的不同数量的任务，具有不同数量的特定于任务的参数(2）一个具有少量底层任务特定参数的多任务模型可以通过对每个任务分别训练的独立模型来获得竞争性的性能，并且优于最先进的MTL框架。我们的观察结果表明，人们重新思考当前的共享范式，并采用新的策略，使用单独的底层参数作为MTL模型设计的一个更强大的基线。摘要：Hard parameter sharing in multi-task learning (MTL) allows tasks to share some of model parameters, reducing storage cost and improving prediction accuracy. The common sharing practice is to share bottom layers of a deep neural network among tasks while using separate top layers for each task. In this work, we revisit this common practice via an empirical study on fine-grained image classification tasks and make two surprising observations. (1) Using separate bottom-layer parameters could achieve significantly better performance than the common practice and this phenomenon holds for different number of tasks jointly trained on different backbone architectures with different quantity of task-specific parameters. (2) A multi-task model with a small proportion of task-specific parameters from bottom layers can achieve competitive performance with independent models trained on each task separately and outperform a state-of-the-art MTL framework. Our observations suggest that people rethink the current sharing paradigm and adopt the new strategy of using separate bottom-layer parameters as a stronger baseline for model design in MTL.

【2】 Tackling the Overestimation of Forest Carbon with Deep Learning and Aerial Imagery 标题：利用深度学习和航空影像解决森林碳高估问题

作者：Gyri Reiersen,David Dao,Björn Lütjens,Konstantin Klemmer,Xiaoxiang Zhu,Ce Zhang 机构：Equal contribution 1Department of Informatics, TechnicalUniversity of Munich, Germany 2Department of Com-puter Science, Switzerland 3Department ofAeronautics and Astronautic, Massachusetts Institute of Tech-nology 备注：Spotlight talk at the Tackling Climate Change with Machine Learning workshop at the ICML 2021 this https URL 链接：https://arxiv.org/abs/2107.11320 摘要：森林碳补偿越来越受欢迎，可以在资助气候缓解、森林保护和重新造林方面发挥重要作用。然而，测量森林中储存了多少碳仍然主要是通过昂贵、耗时、有时甚至是不负责任的实地测量来完成的。为了克服这些限制，许多核查机构正在利用机器学习（ML）算法从卫星或航空图像估算森林碳。航空影像允许对树种或科进行分类，从而改进了基于卫星影像的森林类型分类。然而，航空图像的收集成本明显更高，更高的分辨率在多大程度上改善了森林碳估算尚不清楚。本文描述了首次通过基于深度学习算法的热带再造林项目，从航空图像、卫星图像和地面实测数据中系统地比较森林碳估算。我们的初步结果表明，卫星图像估算的森林碳可以高估地上生物量10倍以上的热带造林项目。航空和卫星森林碳测量之间的显著差异表明了基于航空图像的最大似然算法的潜力，并提高了将这项研究扩展到碳测量方案之间全球基准的重要性。摘要：Forest carbon offsets are increasingly popular and can play a significant role in financing climate mitigation, forest conservation, and reforestation. Measuring how much carbon is stored in forests is, however, still largely done via expensive, time-consuming, and sometimes unaccountable field measurements. To overcome these limitations, many verification bodies are leveraging machine learning (ML) algorithms to estimate forest carbon from satellite or aerial imagery. Aerial imagery allows for tree species or family classification, which improves the satellite imagery-based forest type classification. However, aerial imagery is significantly more expensive to collect and it is unclear by how much the higher resolution improves the forest carbon estimation. This proposal paper describes the first systematic comparison of forest carbon estimation from aerial imagery, satellite imagery, and ground-truth field measurements via deep learning-based algorithms for a tropical reforestation project. Our initial results show that forest carbon estimates from satellite imagery can overestimate above-ground biomass by more than 10-times for tropical reforestation projects. The significant difference between aerial and satellite-derived forest carbon measurements shows the potential for aerial imagery-based ML algorithms and raises the importance to extend this study to a global benchmark between options for carbon measurements.

【3】 Machine Learning with a Reject Option: A survey 标题：带拒绝选项的机器学习：综述

作者：Kilian Hendrickx,Lorenzo Perini,Dries Van der Plas,Wannes Meert,Jesse Davis 机构： BelgiumUniversity of Antwerp 链接：https://arxiv.org/abs/2107.11277 摘要：机器学习模型总是做出预测，即使它可能是不准确的。在许多决策支持应用程序中应该避免这种行为，因为错误可能会带来严重后果。尽管已经在1970年进行了研究，但带有拒绝选项的机器学习最近引起了人们的兴趣。这个机器学习子域使机器学习模型能够避免在可能出错时进行预测。这项调查的目的是提供一个概述机器学习与拒绝选项。我们介绍了导致两类拒绝的条件：歧义拒绝和新奇拒绝。此外，我们定义了现有的拒绝模型的体系结构，描述了训练此类模型的标准学习策略，并将传统的机器学习技术与拒绝联系起来。此外，我们回顾了评估模型预测和拒绝质量的策略。最后，我们提供了相关应用领域的例子，并展示了拒绝机器学习与其他机器学习研究领域的关系。摘要：Machine learning models always make a prediction, even when it is likely to be inaccurate. This behavior should be avoided in many decision support applications, where mistakes can have severe consequences. Albeit already studied in 1970, machine learning with a reject option recently gained interest. This machine learning subfield enables machine learning models to abstain from making a prediction when likely to make a mistake. This survey aims to provide an overview on machine learning with a reject option. We introduce the conditions leading to two types of rejection, ambiguity and novelty rejection. Moreover, we define the existing architectures for models with a reject option, describe the standard learning strategies to train such models and relate traditional machine learning techniques to rejection. Additionally, we review strategies to evaluate a model's predictive and rejective quality. Finally, we provide examples of relevant application domains and show how machine learning with rejection relates to other machine learning research areas.

【4】 Taxonomizing local versus global structure in neural network loss landscapes 标题：神经网络损失景观中局部结构与全局结构的分类

作者：Yaoqing Yang,Liam Hodgkinson,Ryan Theisen,Joe Zou,Joseph E. Gonzalez,Kannan Ramchandran,Michael W. Mahoney 机构：University of California, Berkeley 链接：https://arxiv.org/abs/2107.11228 摘要：在统计力学的学习方法中，从损失的角度观察神经网络模型有着悠久的历史，近年来在机器学习领域受到了广泛的关注。除此之外，局部度量（如损失景观的平滑度）已被证明与模型的全局属性（如良好的泛化）相关。在这里，我们对数千个神经网络模型的结构进行了详细的实证分析，系统地改变了学习任务、模型结构和/或数据的数量/质量。通过考虑一系列试图捕捉损失景观不同方面的指标，我们证明了最佳测试精度是在以下情况下获得的：损失景观是全球良好连接的；训练模型的集合彼此更相似；模型收敛到局部光滑区域。我们还表明，当模型很小或当它们被训练成较低质量的数据时，全球连通性较差的景观可能会出现；而且，如果损失情况在全球范围内联系不好，那么训练到零损失实际上会导致更差的测试精度。在这些结果的基础上，我们建立了一个简单的一维模型，该模型具有类负荷和类温度参数，我们引入了依赖于这些参数的有效损失景观的概念，并用损失景观的凹凸性来解释我们的结果。从这个角度来看，我们详细的实证结果揭示了学习阶段（以及由此产生的双下降行为）、良好泛化的基本与附带决定因素、学习过程中类负荷和类温度参数的作用、模型和数据对损失情况的不同影响，以及本地和全球指标之间的关系，所有最近感兴趣的话题。摘要：Viewing neural network models in terms of their loss landscapes has a long history in the statistical mechanics approach to learning, and in recent years it has received attention within machine learning proper. Among other things, local metrics (such as the smoothness of the loss landscape) have been shown to correlate with global properties of the model (such as good generalization). Here, we perform a detailed empirical analysis of the loss landscape structure of thousands of neural network models, systematically varying learning tasks, model architectures, and/or quantity/quality of data. By considering a range of metrics that attempt to capture different aspects of the loss landscape, we demonstrate that the best test accuracy is obtained when: the loss landscape is globally well-connected; ensembles of trained models are more similar to each other; and models converge to locally smooth regions. We also show that globally poorly-connected landscapes can arise when models are small or when they are trained to lower quality data; and that, if the loss landscape is globally poorly-connected, then training to zero loss can actually lead to worse test accuracy. Based on these results, we develop a simple one-dimensional model with load-like and temperature-like parameters, we introduce the notion of an \emph{effective loss landscape} depending on these parameters, and we interpret our results in terms of a \emph{rugged convexity} of the loss landscape. When viewed through this lens, our detailed empirical results shed light on phases of learning (and consequent double descent behavior), fundamental versus incidental determinants of good generalization, the role of load-like and temperature-like parameters in the learning process, different influences on the loss landscape from model and data, and the relationships between local and global metrics, all topics of recent interest.

【5】 Wavelet Design in a Learning Framework 标题：学习框架中的小波设计

作者：Dhruv Jawali,Abhishek Kumar,Chandra Sekhar Seelamantula 备注：This work has been submitted to the IEEE Transactions on Pattern Analysis and Machine Intelligence for possible publication 链接：https://arxiv.org/abs/2107.11225 摘要：小波在许多信号和图像处理应用中被证明是非常成功的。小波设计是一个活跃的研究领域已经有二十多年的历史了，人们常常从分析的角度来研究这个问题。本文介绍了一种基于学习的小波设计方法。我们画了卷积自编码器和小波多分辨率近似之间的平行图，并展示了学习角度如何为解决设计问题提供一个连贯的计算框架。我们的目标是通过训练滤波器组自动编码器来设计与数据无关的小波，这样就不需要定制数据集。事实上，我们使用高维高斯向量来训练滤波器组自动编码器，并且显示接近零的训练损失意味着所学习的滤波器以非常高的概率满足完美重建特性。小波的正交性、紧支撑性、光滑性、对称性和消失矩等特性可以通过适当地设计自动编码器结构和在学习过程中使用的均方误差代价中添加适当的正则化项来合并。我们的方法不仅恢复了著名的Daubechies正交小波族和Cohen-Daubechies-Feauveau对称双正交小波族，而且还学习了这些正交小波族之外的小波。摘要：Wavelets have proven to be highly successful in several signal and image processing applications. Wavelet design has been an active field of research for over two decades, with the problem often being approached from an analytical perspective. In this paper, we introduce a learning based approach to wavelet design. We draw a parallel between convolutional autoencoders and wavelet multiresolution approximation, and show how the learning angle provides a coherent computational framework for addressing the design problem. We aim at designing data-independent wavelets by training filterbank autoencoders, which precludes the need for customized datasets. In fact, we use high-dimensional Gaussian vectors for training filterbank autoencoders, and show that a near-zero training loss implies that the learnt filters satisfy the perfect reconstruction property with very high probability. Properties of a wavelet such as orthogonality, compact support, smoothness, symmetry, and vanishing moments can be incorporated by designing the autoencoder architecture appropriately and with a suitable regularization term added to the mean-squared error cost used in the learning process. Our approach not only recovers the well known Daubechies family of orthogonal wavelets and the Cohen-Daubechies-Feauveau family of symmetric biorthogonal wavelets, but also learns wavelets outside these families.

【6】 Bias Loss for Mobile Neural Networks 标题：移动神经网络的偏置损耗

作者：Lusine Abrahamyan,Valentin Ziatchin,Yiming Chen,Nikos Deligiannis 机构：Vrije Universiteit Brussel, Brussels, Belgium, PicsArt Inc., San Francisco, USA 备注：Accepted at ICCV2021 链接：https://arxiv.org/abs/2107.11170 摘要：近年来，紧凑卷积神经网络（CNNs）在性能上有了显著的提高。然而，在参数众多的情况下，它们仍然不能提供与CNNs相同的预测能力。这些层所捕捉到的多样甚至丰富的特征是这些成功的cnn的一个重要特征。然而，大型cnn与紧凑型cnn在这一特性上的差异很少被研究。在紧凑型CNNs中，由于参数数目有限，不可能获得丰富的特征，而特征多样性成为CNNs的一个重要特征。在模型推理期间从数据点导出的激活映射中存在的不同特征可以指示存在一组区分不同类的对象所必需的唯一描述符。相比之下，具有低特征多样性的数据点可能无法提供足够数量的唯一描述符来进行有效预测；我们称之为随机预测。随机预测会对优化过程产生负面影响，并损害最终性能。本文提出通过重塑标准交叉熵来解决随机预测问题，使其偏向于具有有限数量独特描述特征的数据点。我们的新的偏差损失集中在一组有价值的数据点的训练，并防止了大量的样本与不良的学习特性误导优化过程。此外，为了说明多样性的重要性，我们提出了一系列SkipNet模型，这些模型的体系结构增加了最后一层中唯一描述符的数量。我们的Skipnet-M比MobileNetV3-Large的分类精度高1%。摘要：Compact convolutional neural networks (CNNs) have witnessed exceptional improvements in performance in recent years. However, they still fail to provide the same predictive power as CNNs with a large number of parameters. The diverse and even abundant features captured by the layers is an important characteristic of these successful CNNs. However, differences in this characteristic between large CNNs and their compact counterparts have rarely been investigated. In compact CNNs, due to the limited number of parameters, abundant features are unlikely to be obtained, and feature diversity becomes an essential characteristic. Diverse features present in the activation maps derived from a data point during model inference may indicate the presence of a set of unique descriptors necessary to distinguish between objects of different classes. In contrast, data points with low feature diversity may not provide a sufficient amount of unique descriptors to make a valid prediction; we refer to them as random predictions. Random predictions can negatively impact the optimization process and harm the final performance. This paper proposes addressing the problem raised by random predictions by reshaping the standard cross-entropy to make it biased toward data points with a limited number of unique descriptive features. Our novel Bias Loss focuses the training on a set of valuable data points and prevents the vast number of samples with poor learning features from misleading the optimization process. Furthermore, to show the importance of diversity, we present a family of SkipNet models whose architectures are brought to boost the number of unique descriptors in the last layers. Our Skipnet-M can achieve 1% higher classification accuracy than MobileNetV3 Large.

【7】 Teaching a neural network with non-tunable exciton-polariton nodes 标题：用不可调谐的激子-极化子节教授神经网络

作者：Andrzej Opala,Riccardo Panico,Vincenzo Ardizzone,Barbara Pietka,Jacek Szczytko,Daniele Sanvitto,Michał Matuszewski,Dario Ballarini 机构：Institute of Physics, Polish Academy of Sciences, Al. Lotnik´ow ,, PL-,-, Warsaw, Poland, Dipartimento di Matematica e Fisica E. De Giorgi, Universita del Salento, Campus Ecotekne, via Monteroni, Lecce , Italy 链接：https://arxiv.org/abs/2107.11156 摘要：与神经网络的软件模拟不同，硬件或神经形态的实现常常限制或没有可调性。虽然这样的网络在速度和能源效率方面有很大的提高，但它们的性能却受到应用高效教学的困难的限制。我们提出了一个不可调激子极性节点系统和一个有效的教学方法，该方法依赖于非线性节点响应的精确测量和随后使用的反向传播算法。实验结果表明，与不使用反向传播的情况相比，MNIST手写数字基准的分类精度有了很大的提高。摘要：In contrast to software simulations of neural networks, hardware or neuromorphic implementations have often limited or no tunability. While such networks promise great improvements in terms of speed and energy efficiency, their performance is limited by the difficulty to apply efficient teaching. We propose a system of non-tunable exciton-polariton nodes and an efficient teaching method that relies on the precise measurement of the nonlinear node response and the subsequent use of the backpropagation algorithm. We demonstrate experimentally that the classification accuracy in the MNIST handwritten digit benchmark is greatly improved compared to the case where backpropagation is not used.

【8】 Constellation: Learning relational abstractions over objects for compositional imagination 标题：星座：学习对象上的关系抽象以进行构图想象

作者：James C. R. Whittington,Rishabh Kabra,Loic Matthey,Christopher P. Burgess,Alexander Lerchner 机构： Factorisedsensory representations are easily re-combined to represent 1UniversityofOxford 2WorkdoneatDeepMind 3DeepMind 4Wayve 链接：https://arxiv.org/abs/2107.11153 摘要：学习视觉场景的结构化表示是目前连接感知和推理的主要瓶颈。虽然基于狭缝的模型已经取得了令人兴奋的进展，它可以学习将场景分割成多组对象，但是学习整个对象组的配置特性仍在探索中。为了解决这个问题，我们引入了Constellation，一个学习静态视觉场景的关系抽象的网络，并将这些抽象概括为感官的特殊性，从而为抽象关系推理提供了一个潜在的基础。我们进一步证明，这个基础，连同语言联想，提供了一种以新的方式想象感官内容的方法。这项工作是在视觉关系的显式表示和复杂的认知过程中使用它们的第一步。摘要：Learning structured representations of visual scenes is currently a major bottleneck to bridging perception with reasoning. While there has been exciting progress with slot-based models, which learn to segment scenes into sets of objects, learning configurational properties of entire groups of objects is still under-explored. To address this problem, we introduce Constellation, a network that learns relational abstractions of static visual scenes, and generalises these abstractions over sensory particularities, thus offering a potential basis for abstract relational reasoning. We further show that this basis, along with language association, provides a means to imagine sensory content in new ways. This work is a first step in the explicit representation of visual relationships and using them for complex cognitive procedures.

【9】 LocalGLMnet: interpretable deep learning for tabular data 标题：LocalGLMnet：表格数据的可解释深度学习

作者：Ronald Richman,Mario V. Wüthrich 机构：Mario V. W¨uthrich† 链接：https://arxiv.org/abs/2107.11059 摘要：深度学习模型在统计建模中得到了广泛的应用，因为它们导致了非常有竞争力的回归模型，通常比经典的统计模型（如广义线性模型）表现更好。深度学习模型的缺点是其解很难解释和解释，变量选择也不容易，因为深度学习模型在内部以不透明的方式解决特征工程和变量选择问题。受广义线性模型吸引人的结构启发，我们提出了一种新的网络结构，该结构与广义线性模型具有相似的特性，但得益于表示学习的艺术，它提供了优越的预测能力。这种新的体系结构允许表格数据的变量选择和校准的深度学习模型的解释，事实上，我们的方法提供了一种基于Shapley值和综合梯度的加法分解。摘要：Deep learning models have gained great popularity in statistical modeling because they lead to very competitive regression models, often outperforming classical statistical models such as generalized linear models. The disadvantage of deep learning models is that their solutions are difficult to interpret and explain, and variable selection is not easily possible because deep learning models solve feature engineering and variable selection internally in a nontransparent way. Inspired by the appealing structure of generalized linear models, we propose a new network architecture that shares similar features as generalized linear models, but provides superior predictive power benefiting from the art of representation learning. This new architecture allows for variable selection of tabular data and for interpretation of the calibrated deep learning model, in fact, our approach provides an additive decomposition in the spirit of Shapley values and integrated gradients.

【10】 Ensemble of Convolution Neural Networks on Heterogeneous Signals for Sleep Stage Scoring 标题：基于非均匀信号的卷积神经网络集成用于睡眠阶段评分

作者：Enrique Fernandez-Blanco,Carlos Fernandez-Lozano,Alejandro Pazos,Daniel Rivero 机构：University of A Coruna, A Coruna, Spain, University of A Coruña, A Coruña, Spain 备注：16 pages, 11 tables and 7 figures 链接：https://arxiv.org/abs/2107.11045 摘要：多年来，有几种方法试图解决睡眠阶段自动评分的问题。尽管任何多导睡眠描记术通常都会收集十几种不同的信号，但这个特殊的问题主要是通过使用这些记录中的脑电图来解决的。另一方面，其他记录的信号主要被大多数作品所忽略。本文探讨和比较了使用脑电图以外的附加信号的方便性。更具体地说，这项工作使用了SHHS-1数据集，其中5804名患者的肌电图同时记录为两个脑电图。为了比较结果，首先，用不同的输入信号及其所有可能的组合对同一体系结构进行了评估。这些测试表明，使用多个信号（尤其是来自不同来源的信号）可以提高分类结果。此外，针对一个或多个信号的每个组合所获得的最佳模型已被用于集合模型中，并对其性能进行了比较，显示了使用这些多信号模型来改进分类的便利性。最佳整体模型是一组深度方向的分离卷积神经网络，其精度达到86.06%，Cohen's Kappa为0.80，$F{1}$为0.77。到目前为止，这些都是对完整数据集的最佳结果，并且对于数据集中最不常见的类，在精确度和召回率方面有了显著的提高。摘要：Over the years, several approaches have tried to tackle the problem of performing an automatic scoring of the sleeping stages. Although any polysomnography usually collects over a dozen of different signals, this particular problem has been mainly tackled by using only the Electroencephalograms presented in those records. On the other hand, the other recorded signals have been mainly ignored by most works. This paper explores and compares the convenience of using additional signals apart from electroencephalograms. More specifically, this work uses the SHHS-1 dataset with 5,804 patients containing an electromyogram recorded simultaneously as two electroencephalograms. To compare the results, first, the same architecture has been evaluated with different input signals and all their possible combinations. These tests show how, using more than one signal especially if they are from different sources, improves the results of the classification. Additionally, the best models obtained for each combination of one or more signals have been used in ensemble models and, its performance has been compared showing the convenience of using these multi-signal models to improve the classification. The best overall model, an ensemble of Depth-wise Separational Convolutional Neural Networks, has achieved an accuracy of 86.06\% with a Cohen's Kappa of 0.80 and a $F_{1}$ of 0.77. Up to date, those are the best results on the complete dataset and it shows a significant improvement in the precision and recall for the most uncommon class in the dataset.

【11】 On the Certified Robustness for Ensemble Models and Beyond 标题：关于系综模型及以后的认证稳健性

作者：Zhuolin Yang,Linyi Li,Xiaojun Xu,Bhavya Kailkhura,Tao Xie,Bo Li 机构： University of Illinois at Urbana-Champaign, USA, Lawrence Livermore National Laboratory, USA 备注：57 pages, 11 pages for main text 链接：https://arxiv.org/abs/2107.10873 摘要：最近的研究表明，深层神经网络（DNN）容易受到对抗性例子的攻击，这些例子的目的是通过增加小幅度的扰动来误导DNN。为了防御这种攻击，对于一个ML模型，经验和理论防御方法都得到了广泛的研究。在这项工作中，我们的目的是分析和提供集成ML模型的鲁棒性，以及不同集成协议鲁棒性的充分必要条件。尽管整体模型在经验上比单一模型更具稳健性；令人惊讶的是，我们发现，在认证的稳健性方面，标准集成模型与单一模型相比只取得了微小的改进。因此，为了探讨保证提供可证明鲁棒集成ML模型的条件，我们首先证明了在模型光滑性假设下，多样性梯度和大置信区间是可证明鲁棒集成ML模型的充要条件。在此基础上，给出了基于该集成的前平滑策略的有界模型平滑度分析。我们还证明了在温和的条件下，集成模型总是比单基模型具有更高的鲁棒性。受这些理论发现的启发，我们提出了轻量级多样性正则化训练（DRT）来训练可证明鲁棒的集成ML模型。大量的实验表明，我们的DRT增强的集成比现有的单一和集成ML模型具有更高的认证鲁棒性，证明了MNIST、CIFAR-10和ImageNet数据集上最先进的认证L2鲁棒性。摘要：Recent studies show that deep neural networks (DNN) are vulnerable to adversarial examples, which aim to mislead DNNs by adding perturbations with small magnitude. To defend against such attacks, both empirical and theoretical defense approaches have been extensively studied for a single ML model. In this work, we aim to analyze and provide the certified robustness for ensemble ML models, together with the sufficient and necessary conditions of robustness for different ensemble protocols. Although ensemble models are shown more robust than a single model empirically; surprisingly, we find that in terms of the certified robustness the standard ensemble models only achieve marginal improvement compared to a single model. Thus, to explore the conditions that guarantee to provide certifiably robust ensemble ML models, we first prove that diversified gradient and large confidence margin are sufficient and necessary conditions for certifiably robust ensemble models under the model-smoothness assumption. We then provide the bounded model-smoothness analysis based on the proposed Ensemble-before-Smoothing strategy. We also prove that an ensemble model can always achieve higher certified robustness than a single base model under mild conditions. Inspired by the theoretical findings, we propose the lightweight Diversity Regularized Training (DRT) to train certifiably robust ensemble ML models. Extensive experiments show that our DRT enhanced ensembles can consistently achieve higher certified robustness than existing single and ensemble ML models, demonstrating the state-of-the-art certified L2-robustness on MNIST, CIFAR-10, and ImageNet datasets.

【12】 Multiclass versus Binary Differentially Private PAC Learning 标题：多类与二进制差分私有PAC学习

作者：Mark Bun,Marco Gaboardi,Satchit Sivakumar 链接：https://arxiv.org/abs/2107.10870 摘要：我们展示了一个从多类差异私有PAC学习到二进制私有PAC学习的通用约简。我们将此转换应用于最近提出的二元私有PAC学习器，得到一个样本复杂度为多项式依赖于多类Littlestone维数和多对数依赖于类数的私有多类学习器。这使得对这两个参数的依赖性比以前的学习者有了指数级的提高。我们的证明将Ben-David等人[JCSS'95]的工作中定义的$\Psi$-维的概念扩展到了在线环境，并探讨了它的一般性质。摘要：We show a generic reduction from multiclass differentially private PAC learning to binary private PAC learning. We apply this transformation to a recently proposed binary private PAC learner to obtain a private multiclass learner with sample complexity that has a polynomial dependence on the multiclass Littlestone dimension and a poly-logarithmic dependence on the number of classes. This yields an exponential improvement in the dependence on both parameters over learners from previous work. Our proof extends the notion of $\Psi$-dimension defined in work of Ben-David et al. [JCSS '95] to the online setting and explores its general properties.

【13】 Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time 标题：局部SGD在多项式时间内优化过参数神经网络

作者：Yuyang Deng,Mehrdad Mahdavi 机构：Department of Computer Science and Engineering, The Pennsylvania State University 链接：https://arxiv.org/abs/2107.10868 摘要：本文证明了局部（S）GD（或FedAvg）能在多项式时间内优化具有修正线性单元（ReLU）激活函数的双层神经网络。尽管在通信高效分布式优化中，局部SGD在优化一般光滑函数方面已经建立了收敛理论，但其在非光滑ReLU网络上的收敛性仍然没有得到充分的理论理解。在许多光滑函数的局部SGD分析中，梯度lipschitz性是一个关键性质，使得局部模型上的梯度不会偏离平均模型上的梯度。然而，在具有非光滑ReLU激活函数的网络中，这种良好的特性并不成立。结果表明，即使ReLU网络不具有梯度lipschitz性质，在局部SGD的动态作用下，局部模型和平均模型上的梯度差异也不会发生太大的变化。我们通过大量的实验验证了我们的理论结果。本文首次证明了局部SGD在非光滑函数上的收敛性，为深度神经网络联合训练的优化理论提供了理论依据。摘要：In this paper we prove that Local (S)GD (or FedAvg) can optimize two-layer neural networks with Rectified Linear Unit (ReLU) activation function in polynomial time. Despite the established convergence theory of Local SGD on optimizing general smooth functions in communication-efficient distributed optimization, its convergence on non-smooth ReLU networks still eludes full theoretical understanding. The key property used in many Local SGD analysis on smooth function is gradient Lipschitzness, so that the gradient on local models will not drift far away from that on averaged model. However, this decent property does not hold in networks with non-smooth ReLU activation function. We show that, even though ReLU network does not admit gradient Lipschitzness property, the difference between gradients on local models and average model will not change too much, under the dynamics of Local SGD. We validate our theoretical results via extensive experiments. This work is the first to show the convergence of Local SGD on non-smooth functions, and will shed lights on the optimization theory of federated training of deep neural networks.

【14】 Regularising Inverse Problems with Generative Machine Learning Models 标题：用产生式机器学习模型正则化反问题

作者：Margaret Duff,Neill D. F. Campbell,Matthias J. Ehrhardt 机构： N D F Campbell 2 and M J Ehrhardt 1 1 Department of Mathematics, University of Bath, UK 2 Department of Computer Science 链接：https://arxiv.org/abs/2107.11191 摘要：在过去的几年里，用深度神经网络方法解决逆成像问题已经取得了令人印象深刻的成果。在本文中，我们考虑使用生成模型在变分正则化方法的逆问题。所考虑的正则化者惩罚远离生成模型范围的图像，该生成模型已学会生成类似于训练数据集的图像。我们把这个家族命名为\textit{generative regulasers}。生成正则化的成功与否取决于生成模型的质量，因此我们提出了一套理想的标准来评估模型并指导未来的研究。在我们的数值实验中，我们评估了三种常见的生成模型，自动编码器，变分自动编码器和生成对抗网络，与我们期望的标准。我们还测试了三种不同的生成正则对反问题的去模糊，反褶积，层析成像。我们证明了严格限制在生成元范围内的解的成功与否在很大程度上取决于生成模型的能力，但是允许与生成元范围的小偏差产生更一致的结果。摘要：Deep neural network approaches to inverse imaging problems have produced impressive results in the last few years. In this paper, we consider the use of generative models in a variational regularisation approach to inverse problems. The considered regularisers penalise images that are far from the range of a generative model that has learned to produce images similar to a training dataset. We name this family \textit{generative regularisers}. The success of generative regularisers depends on the quality of the generative model and so we propose a set of desired criteria to assess models and guide future research. In our numerical experiments, we evaluate three common generative models, autoencoders, variational autoencoders and generative adversarial networks, against our desired criteria. We also test three different generative regularisers on the inverse problems of deblurring, deconvolution, and tomography. We show that the success of solutions restricted to lie exactly in the range of the generator is highly dependent on the ability of the generative model but that allowing small deviations from the range of the generator produces more consistent results.

【15】 A comparison of combined data assimilation and machine learning methods for offline and online model error correction 标题：数据同化和机器学习相结合的离线和在线模型误差校正方法的比较

作者：Alban Farchi,Marc Bocquet,Patrick Laloyaux,Massimo Bonavita,Quentin Malartic 链接：https://arxiv.org/abs/2107.11114 摘要：最近的研究表明，将机器学习方法与数据同化相结合，仅利用系统的稀疏和噪声观测就可以重构出一个动力系统。同样的方法也可以用来修正基于知识的模型的错误。由此产生的代理模型是混合的，统计部分补充了物理部分。在实践中，可以将校正作为一个综合项（即在模型预解中）或直接添加到物理模型的趋势中。这种预解校正方法易于实现。趋势修正更具技术性，特别是需要物理模型的伴随，而且更具灵活性。利用双尺度Lorenz模型对两种方法进行了比较。采用预解修正和趋势修正的替代模型在长期预报实验中的精度有一定的相似性。相比之下，在资料同化实验中，使用趋势修正的替代模式显著优于使用预解修正的替代模式。最后，我们证明了趋势修正开启了在线模型误差修正的可能性，即随着新观测值的出现，逐步改进模型。该算法可以看作是弱约束4D-Var的一种新形式。我们将在线学习和离线学习与双尺度Lorenz系统进行了比较，结果表明，在线学习可以从稀疏和噪声观测中提取所有信息。摘要：Recent studies have shown that it is possible to combine machine learning methods with data assimilation to reconstruct a dynamical system using only sparse and noisy observations of that system. The same approach can be used to correct the error of a knowledge-based model. The resulting surrogate model is hybrid, with a statistical part supplementing a physical part. In practice, the correction can be added as an integrated term (i.e. in the model resolvent) or directly inside the tendencies of the physical model. The resolvent correction is easy to implement. The tendency correction is more technical, in particular it requires the adjoint of the physical model, but also more flexible. We use the two-scale Lorenz model to compare the two methods. The accuracy in long-range forecast experiments is somewhat similar between the surrogate models using the resolvent correction and the tendency correction. By contrast, the surrogate models using the tendency correction significantly outperform the surrogate models using the resolvent correction in data assimilation experiments. Finally, we show that the tendency correction opens the possibility to make online model error correction, i.e. improving the model progressively as new observations become available. The resulting algorithm can be seen as a new formulation of weak-constraint 4D-Var. We compare online and offline learning using the same framework with the two-scale Lorenz system, and show that with online learning, it is possible to extract all the information from sparse and noisy observations.

【16】 Learning the structure of wind: A data-driven nonlocal turbulence model for the atmospheric boundary layer 标题：学习风的结构：大气边界层的数据驱动的非局部湍流模式

作者：Brendan Keith,Ustim Khristenko,Barbara Wohlmuth 机构： 2)Department of Mathematics, Technical University of Munich 链接：https://arxiv.org/abs/2107.11046 摘要：提出了一种新的数据驱动的大气边界层模拟方法。这种方法导致了一个非局部的各向异性的合成湍流模型，我们称之为深快速畸变（DRD）模型。我们的方法依赖于一个算子回归问题，该问题描述了由神经网络部分参数化的非局部协方差核的一般族中的最佳拟合候选。这一族协方差核是用傅里叶空间表示的，它是在很高的雷诺数下由Navier-Stokes方程的近似解得到的。这个家族的每个成员都具有重要的物理性质，如质量守恒和真实的能量级联。DRD模型可以用野外实验的噪声数据进行校正。经标定后，该模型可用于合成湍流速度场。为此，我们提出了一种新的基于区域分解的数值方法，该方法可以利用DRD模型和其他模型产生可伸缩的、内存高效的湍流。我们用1968年空军剑桥研究实验室堪萨斯实验的滤波和噪声数据证明了我们方法的鲁棒性。利用这些数据，我们见证了DRD模型的非凡准确性，尤其是与国际电工委员会标准相比。摘要：We develop a novel data-driven approach to modeling the atmospheric boundary layer. This approach leads to a nonlocal, anisotropic synthetic turbulence model which we refer to as the deep rapid distortion (DRD) model. Our approach relies on an operator regression problem which characterizes the best fitting candidate in a general family of nonlocal covariance kernels parameterized in part by a neural network. This family of covariance kernels is expressed in Fourier space and is obtained from approximate solutions to the Navier--Stokes equations at very high Reynolds numbers. Each member of the family incorporates important physical properties such as mass conservation and a realistic energy cascade. The DRD model can be calibrated with noisy data from field experiments. After calibration, the model can be used to generate synthetic turbulent velocity fields. To this end, we provide a new numerical method based on domain decomposition which delivers scalable, memory-efficient turbulence generation with the DRD model as well as others. We demonstrate the robustness of our approach with both filtered and noisy data coming from the 1968 Air Force Cambridge Research Laboratory Kansas experiments. Using this data, we witness exceptional accuracy with the DRD model, especially when compared to the International Electrotechnical Commission standard.

【17】 Deep Learning Based Reconstruction of Total Solar Irradiance 标题：基于深度学习的太阳总辐照度重建

作者：Yasser Abduallah,Jason T. L. Wang,Yucong Shen,Khalid A. Alobaid,Serena Criscuoli,Haimin Wang 机构： USA 3Institute for SpaceWeather Sciences, New Jersey Institute of Technology, UniversityHeights 备注：8 pages, 11 figures 链接：https://arxiv.org/abs/2107.11042 摘要：地球的主要能量来源是太阳产生的辐射能，当测量所有辐射时，称为太阳辐照度或太阳总辐照度（TSI）。太阳辐照度的微小变化会对地球的气候和大气层产生重大影响。因此，研究和测量太阳辐照度对于理解气候变化和太阳变率至关重要。已经发展了几种方法来重建长时间和短时间的太阳总辐照度；然而，它们是以物理学为基础的，依赖于数据的可用性，而数据的可用性不超过9000年。在本文中，我们提出了一种新的方法，称之为tsnet，通过深入的学习来重建太阳总辐照度，这种学习可以在短时间和长时间内跨越物理模型的数据可用性。在现有的数据上，我们的方法与最新的基于物理的重建模型非常吻合。据我们所知，这是9000多年来首次利用深度学习重建太阳总辐照度。摘要：The Earth's primary source of energy is the radiant energy generated by the Sun, which is referred to as solar irradiance, or total solar irradiance (TSI) when all of the radiation is measured. A minor change in the solar irradiance can have a significant impact on the Earth's climate and atmosphere. As a result, studying and measuring solar irradiance is crucial in understanding climate changes and solar variability. Several methods have been developed to reconstruct total solar irradiance for long and short periods of time; however, they are physics-based and rely on the availability of data, which does not go beyond 9,000 years. In this paper we propose a new method, called TSInet, to reconstruct total solar irradiance by deep learning for short and long periods of time that span beyond the physical models' data availability. On the data that are available, our method agrees well with the state-of-the-art physics-based reconstruction models. To our knowledge, this is the first time that deep learning has been used to reconstruct total solar irradiance for more than 9,000 years.

其他(11篇)

【1】 Multi-Channel Automatic Music Transcription Using Tensor Algebra 标题：基于张量代数的多通道自动配乐

作者：Marmoret Axel,Bertin Nancy,Cohen Jeremy 备注：40 pages, 14 figues, 5 tables, code can be found at: this https URL 链接：https://arxiv.org/abs/2107.11250 摘要：音乐是一门艺术，每一位听众都能以独特的方式感知到来自声音信号的声音。同时，音乐乐谱的标准也在描述它。即使人类能够进行这种转录，在时间和精力上也是代价高昂的，随着信息的不断爆炸和互联网的兴起，代价更高。从这个意义上说，研究是朝着音乐自动抄写的方向发展的。虽然这项任务在单音符的情况下被认为是可以解决的，但当音符重叠形成和弦时，它仍然是开放的。本报告旨在发展一些现有的音乐转录技术，特别是矩阵分解，并介绍多通道自动音乐转录的概念。这个概念将用称为张量的数学对象来探讨。摘要：Music is an art, perceived in unique ways by every listener, coming from acoustic signals. In the meantime, standards as musical scores exist to describe it. Even if humans can make this transcription, it is costly in terms of time and efforts, even more with the explosion of information consecutively to the rise of the Internet. In that sense, researches are driven in the direction of Automatic Music Transcription. While this task is considered solved in the case of single notes, it is still open when notes superpose themselves, forming chords. This report aims at developing some of the existing techniques towards Music Transcription, particularly matrix factorization, and introducing the concept of multi-channel automatic music transcription. This concept will be explored with mathematical objects called tensors.

【2】 Exploring Deep Registration Latent Spaces 标题：探索深度配准潜在空间

作者：Théo Estienne,Maria Vakalopoulou,Stergios Christodoulidis,Enzo Battistella,Théophraste Henry,Marvin Lerousseau,Amaury Leroy,Guillaume Chassagnon,Marie-Pierre Revel,Nikos Paragios,Eric Deutsch 机构：Deutsch, Universit´e Paris-Saclay, CentraleSup´elec, Math´ematiques et Informatique pour la, Complexit´e et les Systemes, Inria Saclay, Gif-sur-Yvette, France., Universit´e Paris-Saclay, Institut Gustave Roussy, Inserm, Radioth´erapie 备注：13 pages, 5 figures + 3 figures in supplementary materials Accepted to DART 2021 workshop 链接：https://arxiv.org/abs/2107.11238 摘要：深层神经网络的可解释性是该领域最具挑战性和最有趣的问题之一。在本研究中，我们主要探讨基于深度学习的配准方法的可解释性。特别是，通过适当的模型结构和使用简单的线性投影，我们分解了编码空间，生成了一个新的基，并从经验上证明了这个基捕获了各种分解的解剖感知几何变换。我们使用两种不同的数据集进行实验，重点放在肺和海马MRI上。我们证明了这种方法可以分解正交空间中注册管道的高度卷积的潜在空间，并具有一些有趣的性质。我们希望这项工作能为更好地理解基于深度学习的注册方法提供一些启示。摘要：Explainability of deep neural networks is one of the most challenging and interesting problems in the field. In this study, we investigate the topic focusing on the interpretability of deep learning-based registration methods. In particular, with the appropriate model architecture and using a simple linear projection, we decompose the encoding space, generating a new basis, and we empirically show that this basis captures various decomposed anatomically aware geometrical transformations. We perform experiments using two different datasets focusing on lungs and hippocampus MRI. We show that such an approach can decompose the highly convoluted latent spaces of registration pipelines in an orthogonal space with several interesting properties. We hope that this work could shed some light on a better understanding of deep learning-based registration methods.

【3】 OLR 2021 Challenge: Datasets, Rules and Baselines 标题：OLR 2021挑战：数据集、规则和基线

作者：Binling Wang,Wenxuan Hu,Jing Li,Yiming Zhi,Zheng Li,Qingyang Hong,Lin Li,Dong Wang,Liming Song,Cheng Yang 机构：† School of Informatics, Xiamen University, ‡ School of Electronic Science and Engineering, Xiamen University, § Center for Speech and Language Technologies, Tsinghua University, ¶ Beijing National Research Center for Information Science and Technology 备注：arXiv admin note: text overlap with arXiv:2006.03473, arXiv:1907.07626, arXiv:1806.00616, arXiv:1706.09742 链接：https://arxiv.org/abs/2107.11113 摘要：本文介绍了第六届东方语言识别（OLR）2021挑战赛，旨在提高语言识别系统和语音识别系统在多语言场景下的性能。本文介绍了数据概况、四项任务、两条基线和评价原则。除了语言识别（LID）任务外，多语言自动语音识别（ASR）任务首次被引入olr2021挑战。今年的挑战集中在更实际和更具挑战性的问题上，有四项任务：（1）受限LID，（2）无约束LID，（3）受限多语言ASR，（4）无约束多语言ASR。分别提供了LID任务和多语言ASR任务的基线。LID基线系统是用Pytorch构造的扩展tdnnx矢量模型。提供了一个基于转换器的端到端模型作为多语言ASR基线系统。这些食谱将在线发布，供参与者构建自己的LID或ASR系统。基线结果表明，这些任务具有相当大的挑战性，需要付出更多努力才能取得更好的绩效。摘要：This paper introduces the sixth Oriental Language Recognition (OLR) 2021 Challenge, which intends to improve the performance of language recognition systems and speech recognition systems within multilingual scenarios. The data profile, four tasks, two baselines, and the evaluation principles are introduced in this paper. In addition to the Language Identification (LID) tasks, multilingual Automatic Speech Recognition (ASR) tasks are introduced to OLR 2021 Challenge for the first time. The challenge this year focuses on more practical and challenging problems, with four tasks: (1) constrained LID, (2) unconstrained LID, (3) constrained multilingual ASR, (4) unconstrained multilingual ASR. Baselines for LID tasks and multilingual ASR tasks are provided, respectively. The LID baseline system is an extended TDNN x-vector model constructed with Pytorch. A transformer-based end-to-end model is provided as the multilingual ASR baseline system. These recipes will be online published, and available for participants to construct their own LID or ASR systems. The baseline results demonstrate that those tasks are rather challenging and deserve more effort to achieve better performance.

【4】 Reservoir Computing Approach for Gray Images Segmentation 标题：一种用于灰度图像分割的储层计算方法

作者：Petia Koprinkova-Hristova 机构：Institute of Information and Communication Technologies, Bulgarian Academy of Sciences, Sofia, Bulgaria 备注：12 pages, 7 figures, submitted to conference ICANN 2021 but not accepted 链接：https://arxiv.org/abs/2107.11077 摘要：提出了一种新的灰度图像分割方法。该方法基于回波状态网络，从图像像素的单个特征即其强度值中提取多个特征。新提取的特征——储层平衡状态——揭示了隐藏的图像特征，通过聚类算法改进了图像分割。此外，还证明了储层的固有塑性调谐使其平衡状态与原始图像的强度分布相吻合，从而使其具有更好的分割效果。该方法在基准图像Lena上进行了测试。摘要：The paper proposes a novel approach for gray scale images segmentation. It is based on multiple features extraction from single feature per image pixel, namely its intensity value, using Echo state network. The newly extracted features -- reservoir equilibrium states -- reveal hidden image characteristics that improve its segmentation via a clustering algorithm. Moreover, it was demonstrated that the intrinsic plasticity tuning of reservoir fits its equilibrium states to the original image intensity distribution thus allowing for its better segmentation. The proposed approach is tested on the benchmark image Lena.

【5】 FNetAR: Mixing Tokens with Autoregressive Fourier Transforms 标题：FNetAR：混合符号与自回归傅立叶变换

作者：Tim Lou,Michael Park,Mohammad Ramezanali,Vincent Tang 机构：X-Mechanics, Cresskill, NJ, LiveRamp, San Fransisco, CA, Appliedinfo Partners, Somerset, NJ, Salesforce, San Fransisco, CA, SamsungNEXT, New York, NY 备注：final experimental results forthcoming 链接：https://arxiv.org/abs/2107.10932 摘要：在本文中，我们研究了FNet算法的自回归推广，其中标准Transformer结构的自关注层被基于傅立叶变换的稀疏均匀采样过程所取代。使用Wikitext-103基准，我们证明FNetAR在因果语言建模任务上保持了最先进的性能（25.8 ppl），而Transformer-XL的基线（24.2 ppl）只有自我注意层数量的一半，从而为具有重复合机制的深层神经网络的超流性提供了进一步的证据。在大多数基于Transformer的时间序列预测模型中，自回归傅里叶变换可能用于参数还原。摘要：In this note we examine the autoregressive generalization of the FNet algorithm, in which self-attention layers from the standard Transformer architecture are substituted with a trivial sparse-uniformsampling procedure based on Fourier transforms. Using the Wikitext-103 benchmark, we demonstratethat FNetAR retains state-of-the-art performance (25.8 ppl) on the task of causal language modelingcompared to a Transformer-XL baseline (24.2 ppl) with only half the number self-attention layers,thus providing further evidence for the superfluity of deep neural networks with heavily compoundedattention mechanisms. The autoregressive Fourier transform could likely be used for parameterreduction on most Transformer-based time-series prediction models.

【6】 Discovering Sparse Interpretable Dynamics from Partial Observations 标题：从部分观测中发现稀疏可解释动力学

作者：Peter Y. Lu,Joan Ariño,Marin Soljačić 机构：Department of Physics, Massachusetts Institute of Technology, Cambridge, MA , USA, Department of Physics, Universitat Politecnica de Catalunya, Barcelona, Spain, ) 备注：8 pages, 4 figures 链接：https://arxiv.org/abs/2107.10879 摘要：识别非线性动力系统的控制方程是理解系统物理特性和建立精确的动力学模型的关键。我们提出了一个机器学习框架来发现这些控制方程只使用部分观测，结合编码器的状态重建与稀疏符号模型。实验结果表明，该方法能成功地重构出各种ODE和PDE系统的完整状态，并能识别出系统的动力学行为。摘要：Identifying the governing equations of a nonlinear dynamical system is key to both understanding the physical features of the system and constructing an accurate model of the dynamics that generalizes well beyond the available data. We propose a machine learning framework for discovering these governing equations using only partial observations, combining an encoder for state reconstruction with a sparse symbolic model. Our tests show that this method can successfully reconstruct the full system state and identify the underlying dynamics for a variety of ODE and PDE systems.

【7】 Filament Plots for Data Visualization 标题：用于数据可视化的丝状图

作者：Nate Strawn 机构：Department of Mathematics and Statistics, Georgetown University, Washington, D.C. , USA, Editor: 备注：33 pages, 13 figures 链接：https://arxiv.org/abs/2107.10869 摘要：我们通过考虑Frenet-Serret方程生成的曲线和最优光滑2D的Andrew图，构造了一个计算成本低廉的Andrew图的3D扩展。我们考虑从欧几里德数据空间到2D曲线的无限维空间的线性等距，并在给定的数据集上参数化（平均）最优平滑曲线的线性等距。这组最优等距允许多个自由度，并且（使用最近关于广义高斯和的结果）我们确定了这组最优等距的一个特殊成员，它允许渐近投影“巡更”性质。最后，我们考虑单位长度的3D曲线（细丝）由这些2D安得烈图诱导，其中线性等距特性保留距离为“相对总平方曲率”。这项工作的最后说明灯丝图几个数据集。代码位于https://github.com/n8epi/filaments 摘要：We construct a computationally inexpensive 3D extension of Andrew's plots by considering curves generated by Frenet-Serret equations and induced by optimally smooth 2D Andrew's plots. We consider linear isometries from a Euclidean data space to infinite dimensional spaces of 2D curves, and parametrize the linear isometries that produce (on average) optimally smooth curves over a given dataset. This set of optimal isometries admits many degrees of freedom, and (using recent results on generalized Gauss sums) we identify a particular a member of this set which admits an asymptotic projective "tour" property. Finally, we consider the unit-length 3D curves (filaments) induced by these 2D Andrew's plots, where the linear isometry property preserves distances as "relative total square curvatures". This work concludes by illustrating filament plots for several datasets. Code is available at https://github.com/n8epi/filaments

【8】 Joint Shapley values: a measure of joint feature importance 标题：关节Shapley值：关节特征重要性的度量

作者：Chris Harris,Richard Pymar,Colin Rowat 机构：Visual Alpha, Tokyo, Japan, Economics, Mathematics and Statistics, Birkbeck College University of London, UK, University of Birmingham, UK 备注：Source code available at this https URL 链接：https://arxiv.org/abs/2107.11357 摘要：Shapley值是可解释人工智能中最广泛使用的特征重要性的模型不可知度量之一：它有明确的公理基础，保证唯一存在，并且作为特征对模型预测的平均影响有明确的解释。我们引入了联合Shapley值，它直接扩展了Shapley公理。这保留了经典的Shapley值的直觉：联合Shapley值度量一组特征对模型预测的平均影响。证明了联合Shapley值的唯一性。游戏结果表明，联合Shapley值与现有的交互指数不同，后者评估了一组特征中一个特征的效果。由此导出ML属性问题中的联合Shapley值，我们就可以第一次度量特征集对模型预测的联合影响。在具有二进制特征的数据集中，我们提出了一种保留效率特性的全局值计算方法。摘要：The Shapley value is one of the most widely used model-agnostic measures of feature importance in explainable AI: it has clear axiomatic foundations, is guaranteed to uniquely exist, and has a clear interpretation as a feature's average effect on a model's prediction. We introduce joint Shapley values, which directly extend the Shapley axioms. This preserves the classic Shapley value's intuitions: joint Shapley values measure a set of features' average effect on a model's prediction. We prove the uniqueness of joint Shapley values, for any order of explanation. Results for games show that joint Shapley values present different insights from existing interaction indices, which assess the effect of a feature within a set of features. Deriving joint Shapley values in ML attribution problems thus gives us the first measure of the joint effect of sets of features on model predictions. In a dataset with binary features, we present a presence-adjusted method for calculating global values that retains the efficiency property.

【9】 Introducing: DeepHead, Wide-band Electromagnetic Imaging Paradigm 标题：简介：深头、宽带电磁成像范式

作者：A. Al-Saffar,L. Guo,A. Abbosh 机构： which is otherwise uneasy to encodeAll authors are with the School of Information Technology and ElectricalEngineering (ITEE) at the University of Queensland 备注：None 链接：https://arxiv.org/abs/2107.11107 摘要：微波环境下的电磁医学成像是一个很难解决的问题：1）不稳定性2）不确定性。这个双管齐下的问题是通过双管齐下的解决方案来解决的，该方案使用双重压缩来最大限度地利用廉价的未标记数据a）提供所需的先验信息，以减轻确定性和b）降低推理对输入的敏感性。结果是具有高分辨率输出的稳定解算器。DeepHead是在微波脑成像的背景下提出的一个完全数据驱动的范例的实现。它利用分布在宽频带上的输入信号，以期望的单一频率推断大脑的介电分布。通过仿真和人体实验对模型的性能进行了评价。所做的推论与模拟案例中的地面真实介电分布和真实案例中志愿者的黄金MRI/CT成像模式并列。摘要：Electromagnetic medical imaging in the microwave regime is a hard problem notorious for 1) instability 2) under-determinism. This two-pronged problem is tackled with a two-pronged solution that uses double compression to maximally utilizing the cheap unlabelled data to a) provide a priori information required to ease under-determinism and b) reduce sensitivity of inference to the input. The result is a stable solver with a high resolution output. DeepHead is a fully data-driven implementation of the paradigm proposed in the context of microwave brain imaging. It infers the dielectric distribution of the brain at a desired single frequency while making use of an input that spreads over a wide band of frequencies. The performance of the model is evaluated with both simulations and human volunteers experiments. The inference made is juxtaposed with ground-truth dielectric distribution in simulation case, and the golden MRI / CT imaging modalities of the volunteers in real-world case.

【10】 The decomposition of the higher-order homology embedding constructed from the k-Laplacian标题：由k-Laplacian构造的高阶同调嵌入的分解

作者：Yu-Chia Chen,Marina Meilă 机构：Electrical & Computer Engineering, University of Washington, Seattle, WA , Marina Meil˘a, Department of Statistics 链接：https://arxiv.org/abs/2107.10970 摘要：$k$-阶Laplacian$\mathbf{\mathcal L}\u k$的空空间称为{\em$k$-阶同调向量空间}，编码流形或网络的非平凡拓扑。理解同调嵌入的结构可以从数据中揭示几何或拓扑信息。对图Laplacian$\mathbf{\mathcal L}u0$的零空间嵌入的研究激发了新的研究和应用，如具有理论保证的谱聚类算法和随机块模型的估计。在这项工作中，我们研究了几何的$k$-第同源嵌入和重点放在案件联想到光谱聚类。也就是说，我们将流形的{em连通和}分析为对其同调嵌入的直和的扰动。提出了一种将同调嵌入到流形最简拓扑分量对应的子空间中的因子分解算法。该框架被应用于{em最短同调循环检测}问题，这是一个NP-hard问题。我们的光谱环路检测算法比现有的方法具有更好的扩展性，并且对点云和图像等不同的数据是有效的。摘要：The null space of the $k$-th order Laplacian $\mathbf{\mathcal L}_k$, known as the {\em $k$-th homology vector space}, encodes the non-trivial topology of a manifold or a network. Understanding the structure of the homology embedding can thus disclose geometric or topological information from the data. The study of the null space embedding of the graph Laplacian $\mathbf{\mathcal L}_0$ has spurred new research and applications, such as spectral clustering algorithms with theoretical guarantees and estimators of the Stochastic Block Model. In this work, we investigate the geometry of the $k$-th homology embedding and focus on cases reminiscent of spectral clustering. Namely, we analyze the {\em connected sum} of manifolds as a perturbation to the direct sum of their homology embeddings. We propose an algorithm to factorize the homology embedding into subspaces corresponding to a manifold's simplest topological components. The proposed framework is applied to the {\em shortest homologous loop detection} problem, a problem known to be NP-hard in general. Our spectral loop detection algorithm scales better than existing methods and is effective on diverse data such as point clouds and images.

【11】 Structured second-order methods via natural gradient descent 标题：基于自然梯度下降的结构化二阶方法

作者：Wu Lin,Frank Nielsen,Mohammad Emtiyaz Khan,Mark Schmidt 机构：Many machine learning applications can be expressed as 1University of British Columbia, Alberta Machine Intelligence Institute 备注：ICML workshop paper. arXiv admin note: substantial text overlap with arXiv:2102.07405 链接：https://arxiv.org/abs/2107.10884 摘要：在本文中，我们提出了新的结构化二阶方法和通过在结构化参数空间上执行自然梯度下降得到的结构化自适应梯度方法。自然梯度下降是设计新算法的一种很有吸引力的方法，在许多情况下，如无梯度、自适应梯度和二阶方法。我们的结构化方法不仅具有结构不变性，而且具有简单的表达式。最后，我们在确定性非凸问题和深度学习问题上验证了所提方法的有效性。摘要：In this paper, we propose new structured second-order methods and structured adaptive-gradient methods obtained by performing natural-gradient descent on structured parameter spaces. Natural-gradient descent is an attractive approach to design new algorithms in many settings such as gradient-free, adaptive-gradient, and second-order methods. Our structured methods not only enjoy a structural invariance but also admit a simple expression. Finally, we test the efficiency of our proposed methods on both deterministic non-convex problems and deep learning problems.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-26，如有侵权请联系 cloudcommunity@tencent.com 删除

linux