机器学习学术速递[12.24]

公众号-arXiv每日学术速递

发布于 2021-12-27 17:06:05

1.1K0

发布于 2021-12-27 17:06:05

cs.LG 方向，今日共计82篇

Graph相关(图学习|图神经网络|图优化等)(1篇)

【1】 ML4CO: Is GCNN All You Need? Graph Convolutional Neural Networks Produce Strong Baselines For Combinatorial Optimization Problems, If Tuned and Trained Properly, on Appropriate Data 标题：ML4CO：GCNN是你需要的全部吗？如果对适当的数据进行适当的调整和训练，图形卷积神经网络可以为组合优化问题提供强大的基线链接：https://arxiv.org/abs/2112.12251

作者：Amin Banitalebi-Dehkordi,Yong Zhang 机构：Huawei Technologies Canada Co., Ltd. 备注：Runner-up in the 2021 ML4CO NeurIPS Competition 摘要：2021个NILPS机器学习组合优化（ML4CO）竞争的目的是改进国家的最先进的组合优化求解器，用机器学习模型替换关键启发式组件。竞赛的主要科学问题是：当历史数据可用时，机器学习是否是改进特定问题分布的传统组合优化解算器的可行选择？这是因为在许多实际场景中，在重复组合优化问题之间，数据只会发生轻微变化，而机器学习模型在这一领域尤其强大。本文总结了华为EI-OROAS团队在竞争的双重任务中的解决方案和经验教训。我们队在最后的排名中获得了第二名，离第一名非常近。此外，在最终评估之前，我们的解决方案在几次每周排行榜更新中始终排名第一。我们提供了从大量实验中获得的见解，并认为简单图卷积神经网络（GCNNs）如果经过适当的训练和调整，可以获得最先进的结果。摘要：The 2021 NeurIPS Machine Learning for Combinatorial Optimization (ML4CO) competition was designed with the goal of improving state-of-the-art combinatorial optimization solvers by replacing key heuristic components with machine learning models. The competition's main scientific question was the following: is machine learning a viable option for improving traditional combinatorial optimization solvers on specific problem distributions, when historical data is available? This was motivated by the fact that in many practical scenarios, the data changes only slightly between the repetitions of a combinatorial optimization problem, and this is an area where machine learning models are particularly powerful at. This paper summarizes the solution and lessons learned by the Huawei EI-OROAS team in the dual task of the competition. The submission of our team achieved the second place in the final ranking, with a very close distance to the first spot. In addition, our solution was ranked first consistently for several weekly leaderboard updates before the final evaluation. We provide insights gained from a large number of experiments, and argue that a simple Graph Convolutional Neural Network (GCNNs) can achieve state-of-the-art results if trained and tuned properly.

Transformer(1篇)

【1】 SeMask: Semantically Masked Transformers for Semantic Segmentation 标题：SeMask：语义屏蔽的语义分词转换器链接：https://arxiv.org/abs/2112.12782

作者：Jitesh Jain,Anukriti Singh,Nikita Orlov,Zilong Huang,Jiachen Li,Steven Walton,Humphrey Shi 机构：Picsart AI Research (PAIR), IIT Roorkee 备注：13 pages, 6 figures 摘要：在图像转换器网络的编码器部分微调预训练主干是语义分割任务的传统方法。然而，这种方法忽略了图像在编码阶段提供的语义上下文。本文认为，在精细调整的同时，将图像的语义信息合并到预训练的基于层次变换的主干中，可以显著提高性能。为了实现这一点，我们提出了SeMask，这是一个简单而有效的框架，通过语义注意操作将语义信息整合到编码器中。此外，我们在训练期间使用一个轻量级语义解码器，在每个阶段对中间语义先验图进行监督。我们的实验表明，加入语义先验提高了已建立的分层编码器的性能，而触发器的数量略有增加。我们通过将SeMask集成到Swin转换器的每个变体中，作为编码器与不同解码器的配对，提供了经验证明。我们的框架在ADE20K数据集上实现了58.22%mIoU的最新状态，在Cityscapes数据集上实现了超过3%的mIoU度量改进。代码和检查点可在https://github.com/Picsart-AI-Research/SeMask-Segmentation . 摘要：Finetuning a pretrained backbone in the encoder part of an image transformer network has been the traditional approach for the semantic segmentation task. However, such an approach leaves out the semantic context that an image provides during the encoding stage. This paper argues that incorporating semantic information of the image into pretrained hierarchical transformer-based backbones while finetuning improves the performance considerably. To achieve this, we propose SeMask, a simple and effective framework that incorporates semantic information into the encoder with the help of a semantic attention operation. In addition, we use a lightweight semantic decoder during training to provide supervision to the intermediate semantic prior maps at every stage. Our experiments demonstrate that incorporating semantic priors enhances the performance of the established hierarchical encoders with a slight increase in the number of FLOPs. We provide empirical proof by integrating SeMask into each variant of the Swin-Transformer as our encoder paired with different decoders. Our framework achieves a new state-of-the-art of 58.22% mIoU on the ADE20K dataset and improvements of over 3% in the mIoU metric on the Cityscapes dataset. The code and checkpoints are publicly available at https://github.com/Picsart-AI-Research/SeMask-Segmentation .

GAN|对抗|攻击|生成相关(5篇)

【1】 Manifold Learning Benefits GANs 标题：多方面的学习益处甘斯链接：https://arxiv.org/abs/2112.12618

作者：Yao Ni,Piotr Koniusz,Richard Hartley,Richard Nock 机构：†The Australian National University, §Data,CSIRO, ♦Google Research 备注：30 pages full version 摘要：在本文中，我们通过在鉴别器中加入流形学习步骤来改进生成性对抗网络。我们考虑局部约束线性和基于子空间的流形，以及局部约束的非线性流形。在我们的设计中，流形学习和编码步骤与鉴别器层交织在一起，目的是将中间特征表示吸引到流形上。我们自适应地平衡特征表示和流形视图之间的差异，这表示在流形上去噪和细化流形之间的权衡。我们得出结论，局部约束非线性流形由于其非均匀密度和光滑性而优于线性流形。我们展示了与不同的最新技术基线相比的实质性改进。摘要：In this paper, we improve Generative Adversarial Networks by incorporating a manifold learning step into the discriminator. We consider locality-constrained linear and subspace-based manifolds, and locality-constrained non-linear manifolds. In our design, the manifold learning and coding steps are intertwined with layers of the discriminator, with the goal of attracting intermediate feature representations onto manifolds. We adaptively balance the discrepancy between feature representations and their manifold view, which represents a trade-off between denoising on the manifold and refining the manifold. We conclude that locality-constrained non-linear manifolds have the upper hand over linear manifolds due to their non-uniform density and smoothness. We show substantial improvements over different recent state-of-the-art baselines.

【2】 How Much of the Chemical Space Has Been Covered? Measuring and Improving the Variety of Candidate Set in Molecular Generation 标题：化学空间已经覆盖了多少？分子生成中候选集多样性的度量与改进链接：https://arxiv.org/abs/2112.12542

作者：Yutong Xie,Ziqiao Xu,Jiaqi Ma,Qiaozhu Mei 机构： machine- 1School of Information, University of Michigan 摘要：形成一个包含多种不同化合物的高质量分子候选集对药物发现的成功至关重要。然而，与旨在优化化学性质的研究相比，如何测量和改进候选药物的多样性还相对缺乏研究。在本文中，我们首先通过公理化分析框架和实证研究来研究正确测量分子多样性的问题。我们的分析表明，许多现有的方法不适合评估分子的多样性。根据我们的分析，我们还提出了新的品种措施。我们进一步明确地将提出的多样性度量集成到分子生成模型的优化目标中。我们的实验结果表明，这个新的优化目标可以指导分子生成模型找到覆盖更大化学空间的化合物，为下游相提供更独特的候选药物选择。摘要：Forming a high-quality molecular candidate set that contains a wide range of dissimilar compounds is crucial to the success of drug discovery. However, comparing to the research aiming at optimizing chemical properties, how to measure and improve the variety of drug candidates is relatively understudied. In this paper, we first investigate the problem of properly measuring the molecular variety through both an axiomatic analysis framework and an empirical study. Our analysis suggests that many existing measures are not suitable for evaluating the variety of molecules. We also propose new variety measures based on our analysis. We further explicitly integrate the proposed variety measures into the optimization objective of molecular generation models. Our experiment results demonstrate that this new optimization objective can guide molecular generation models to find compounds that cover a lager chemical space, providing the downstream phases with more distinctive drug candidate choices.

【3】 Adaptive Modeling Against Adversarial Attacks 标题：抗敌意攻击的自适应建模链接：https://arxiv.org/abs/2112.12431

作者：Zhiwen Yan,Teck Khim Ng 机构： can then accurately calculate thegradient of the model and produce accurate adversarial in-Equal contribution 1School of Computing, University ofSingapore 备注：10 pages, 3 figures 摘要：对抗性训练是利用对抗性数据训练深度学习模型的过程，是深度学习模型中最成功的对抗性防御方法之一。我们发现，如果我们在推理阶段微调该模型以适应对抗性输入，并在其中加入额外信息，则对抗性训练模型对白盒攻击的鲁棒性可以进一步提高。我们介绍了一种算法，该算法在推理阶段使用现有的训练数据在原始输出类和“邻居”类之间“后训练”模型。使用该算法，预训练的快速FGSM CIFAR10分类器基模型对白盒投影梯度攻击（PGD）的准确率可以从46.8%显著提高到64.5%。摘要：Adversarial training, the process of training a deep learning model with adversarial data, is one of the most successful adversarial defense methods for deep learning models. We have found that the robustness to white-box attack of an adversarially trained model can be further improved if we fine tune this model in inference stage to adapt to the adversarial input, with the extra information in it. We introduce an algorithm that "post trains" the model at inference stage between the original output class and a "neighbor" class, with existing training data. The accuracy of pre-trained Fast-FGSM CIFAR10 classifier base model against white-box projected gradient attack (PGD) can be significantly improved from 46.8% to 64.5% with our algorithm.

【4】 Revisiting and Advancing Fast Adversarial Training Through The Lens of Bi-Level Optimization 标题：用双层优化镜头重温和推进快速对抗性训练链接：https://arxiv.org/abs/2112.12376

作者：Yihua Zhang,Guanhuan Zhang,Prashant Khanduri,Mingyi Hong,Shiyu Chang,Sijia Liu 机构：Computer Science & Engineering, Michigan State University, USA, Computer Science, University of California, Santa Barbara, Electrical & Computer Engineering, University of Minnesota, MIT-IBM Watson AI Lab, ∗ Equal contribution 摘要：对抗性训练（AT）已成为一种广泛认可的防御机制，以提高深层神经网络对对抗性攻击的鲁棒性。它解决了一个最小-最大优化问题，其中最小化者（即防御者）寻求一个鲁棒模型，以在存在由最大化者（即攻击者）精心设计的对抗性示例的情况下最小化最坏情况下的训练损失。然而，“最小-最大”特性使得AT计算量大，因此难以扩展。同时，FAST-AT算法，以及事实上许多最近改进AT的算法，通过将其最大化步骤替换为简单的单次梯度符号攻击生成步骤，简化了基于min-max的AT。尽管易于实现，但FAST-AT缺乏理论保证，其实际性能可能不令人满意，在与强大对手进行训练时，会遭受鲁棒性灾难性的过度拟合。在本文中，我们建议从双层优化（BLO）的角度设计FAST-AT。我们首先观察到，FAST-AT最常用的算法规范相当于使用某种梯度下降型算法来解决涉及符号运算的双层问题。然而，符号运算的离散性使得很难理解算法的性能。基于上述观察，我们提出了一个新的可处理的双层优化问题，设计并分析了一组称为Fast-BAT的新算法。FAST-BAT能够在不调用任何梯度符号方法和显式鲁棒正则化的情况下防御基于符号的投影梯度下降（PGD）攻击。此外，我们的经验表明，我们的方法优于最先进的FAST-AT基线，在不导致鲁棒性灾难性过度拟合或标准精度损失的情况下实现了卓越的模型鲁棒性。摘要：Adversarial training (AT) has become a widely recognized defense mechanism to improve the robustness of deep neural networks against adversarial attacks. It solves a min-max optimization problem, where the minimizer (i.e., defender) seeks a robust model to minimize the worst-case training loss in the presence of adversarial examples crafted by the maximizer (i.e., attacker). However, the min-max nature makes AT computationally intensive and thus difficult to scale. Meanwhile, the FAST-AT algorithm, and in fact many recent algorithms that improve AT, simplify the min-max based AT by replacing its maximization step with the simple one-shot gradient sign based attack generation step. Although easy to implement, FAST-AT lacks theoretical guarantees, and its practical performance can be unsatisfactory, suffering from the robustness catastrophic overfitting when training with strong adversaries. In this paper, we propose to design FAST-AT from the perspective of bi-level optimization (BLO). We first make the key observation that the most commonly-used algorithmic specification of FAST-AT is equivalent to using some gradient descent-type algorithm to solve a bi-level problem involving a sign operation. However, the discrete nature of the sign operation makes it difficult to understand the algorithm performance. Based on the above observation, we propose a new tractable bi-level optimization problem, design and analyze a new set of algorithms termed Fast Bi-level AT (FAST-BAT). FAST-BAT is capable of defending sign-based projected gradient descent (PGD) attacks without calling any gradient sign method and explicit robust regularization. Furthermore, we empirically show that our method outperforms state-of-the-art FAST-AT baselines, by achieving superior model robustness without inducing robustness catastrophic overfitting, or suffering from any loss of standard accuracy.

【5】 Crash Data Augmentation Using Conditional Generative Adversarial Networks (CGAN) for Improving Safety Performance Functions 标题：基于条件生成对抗网络(CGAN)改进安全性能函数的碰撞数据增强链接：https://arxiv.org/abs/2112.12263

作者：Mohammad Zarei,Bruce Hellinga 机构： Ph.D. Candidate, Department of Civil and Environmental Engineering, University of Waterloo, University Ave., Waterloo, ON N,L,G 摘要：本文提出了一种基于条件生成对抗网络的碰撞频率数据扩充方法，以改进碰撞频率模型。通过比较基本SPF（使用原始数据开发）和增强SPF（使用原始数据和合成数据开发）在热点识别性能、模型预测精度和色散参数估计精度方面的性能，对所提出的方法进行了评估。这些实验是使用模拟和真实碰撞数据集进行的。结果表明，CGAN合成的碰撞数据与原始数据具有相同的分布，增强的SPF在几乎所有方面都优于基本SPF，尤其是在色散参数较低的情况下。摘要：In this paper, we present a crash frequency data augmentation method based on Conditional Generative Adversarial Networks to improve crash frequency models. The proposed method is evaluated by comparing the performance of Base SPFs (developed using original data) and Augmented SPFs (developed using original data plus synthesised data) in terms of hotspot identification performance, model prediction accuracy, and dispersion parameter estimation accuracy. The experiments are conducted using simulated and real-world crash data sets. The results indicate that the synthesised crash data by CGAN have the same distribution as the original data and the Augmented SPFs outperforms Base SPFs in almost all aspects especially when the dispersion parameter is low.

半/弱/无/有监督|不确定性|主动学习(4篇)

【1】 Improving Robustness and Uncertainty Modelling in Neural Ordinary Differential Equations 标题：提高神经元常微分方程建模的鲁棒性和不确定性链接：https://arxiv.org/abs/2112.12707

作者：Srinivas Anumasa,P. K. Srijith 机构：Computer Science and Engineering, Indian Institute of Technology Hyderbad, India, P.K. Srijith 备注：None 摘要：神经常微分方程（NODE）是对残差网络（ResNets）等常用深度学习模型的一种连续深度推广。它们提供了参数效率，并在一定程度上自动化了深度学习模型中的模型选择过程。然而，它们缺乏非常必要的不确定性建模和鲁棒性能力，这对于它们在一些实际应用中的使用至关重要，如自动驾驶和医疗保健。我们提出了一种新颖独特的方法，通过考虑ODE解算器在结束时间$T$上的分布来建模节点中的不确定性。所提出的潜在时间节点（LT-NODE）方法将$T$作为潜在变量，并应用贝叶斯学习从数据中获得$T$的后验分布。特别地，我们使用变分推理来学习近似的后验概率和模型参数。通过考虑来自不同后验样本的节点表示来进行预测，并且可以使用单个前向过程有效地进行预测。由于$T$隐式定义了节点的深度，因此$T$上的后验分布也有助于节点中的模型选择。我们还提出了一种自适应潜在时间节点（ALT-NODE），它允许每个数据点在结束时间上具有明显的后验分布。ALT-NODE使用分期变分推理来学习使用推理网络的近似后验概率。通过对合成图像分类数据和若干真实图像分类数据的实验，我们证明了所提出的方法在建模不确定性和鲁棒性方面的有效性。摘要：Neural ordinary differential equations (NODE) have been proposed as a continuous depth generalization to popular deep learning models such as Residual networks (ResNets). They provide parameter efficiency and automate the model selection process in deep learning models to some extent. However, they lack the much-required uncertainty modelling and robustness capabilities which are crucial for their use in several real-world applications such as autonomous driving and healthcare. We propose a novel and unique approach to model uncertainty in NODE by considering a distribution over the end-time $T$ of the ODE solver. The proposed approach, latent time NODE (LT-NODE), treats $T$ as a latent variable and apply Bayesian learning to obtain a posterior distribution over $T$ from the data. In particular, we use variational inference to learn an approximate posterior and the model parameters. Prediction is done by considering the NODE representations from different samples of the posterior and can be done efficiently using a single forward pass. As $T$ implicitly defines the depth of a NODE, posterior distribution over $T$ would also help in model selection in NODE. We also propose, adaptive latent time NODE (ALT-NODE), which allow each data point to have a distinct posterior distribution over end-times. ALT-NODE uses amortized variational inference to learn an approximate posterior using inference networks. We demonstrate the effectiveness of the proposed approaches in modelling uncertainty and robustness through experiments on synthetic and several real-world image classification data.

【2】 Human Activity Recognition on wrist-worn accelerometers using self-supervised neural networks 标题：基于自监督神经网络的腕式加速度计人体活动识别链接：https://arxiv.org/abs/2112.12272

作者：Niranjan Sridhar,Lance Myers 机构：Verily Life Sciences, LLC (Alphabet), South San Francisco, California, USA, Address for correspondence:, E Grand Ave, South San Francisco, CA ,- 摘要：日常生活活动量（ADL）是衡量整体健康的重要指标，但在临床上很难测量。使用佩戴在手腕上的加速计实现自动准确的人类活动识别（HAR），从而实现对ADL的实用且经济高效的远程监控。开发高质量的HAR的关键障碍是缺乏大型标记数据集，以及在现实生活中，将基于小型管理数据集训练的模型应用于连续的异构数据流时的性能损失。在这项工作中，我们设计了一个自我监督学习范式，以创建一个能够在设备和主题之间概括的加速计数据的鲁棒表示。我们证明，这种表示方法可以分离日常生活活动，并使用很少的标签（在多个基准数据集上）实现很高的HAR准确性。我们还提出了一种分割算法，该算法可以在连续的真实数据上识别显著活动的片段，并提高HAR的准确性。摘要：Measures of Activity of Daily Living (ADL) are an important indicator of overall health but difficult to measure in-clinic. Automated and accurate human activity recognition (HAR) using wrist-worn accelerometers enables practical and cost efficient remote monitoring of ADL. Key obstacles in developing high quality HAR is the lack of large labeled datasets and the performance loss when applying models trained on small curated datasets to the continuous stream of heterogeneous data in real-life. In this work we design a self-supervised learning paradigm to create a robust representation of accelerometer data that can generalize across devices and subjects. We demonstrate that this representation can separate activities of daily living and achieve strong HAR accuracy (on multiple benchmark datasets) using very few labels. We also propose a segmentation algorithm which can identify segments of salient activity and boost HAR accuracy on continuous real-life data.

【3】 Self-supervised Representation Learning of Neuronal Morphologies 标题：神经元形态学的自监督表征学习链接：https://arxiv.org/abs/2112.12482

作者：Marissa A. Weis,Laura Pede,Timo Lüddecke,Alexander S. Ecker 机构： Morphology-basedclassification has traditionally been carried out by either 1Institute of Computer Science and Campus Institute DataScience, University of G¨ottingen, Germany 2Institute for Theo-retical Physics, University of T¨ubingen 摘要：了解细胞类型的多样性及其在大脑中的功能是神经科学的关键挑战之一。大规模数据集的出现导致了对细胞类型分类的无偏和定量方法的需求。我们介绍GraphDINO，一种纯数据驱动的方法，用于学习神经元三维形态的低维表示。GraphDINO是一种新的基于Transformer模型的自监督学习的空间图表示学习方法。它在节点间基于注意的全局交互和经典的图卷积处理之间平滑插值。我们表明，该方法能够产生与手动基于特征的分类相当的形态学细胞类型聚类，并且与两个不同物种和皮层区域的专家标记细胞类型具有良好的对应性。我们的方法适用于神经科学以外的环境，其中数据集中的样本是图形，需要图形级嵌入。摘要：Understanding the diversity of cell types and their function in the brain is one of the key challenges in neuroscience. The advent of large-scale datasets has given rise to the need of unbiased and quantitative approaches to cell type classification. We present GraphDINO, a purely data-driven approach to learning a low dimensional representation of the 3D morphology of neurons. GraphDINO is a novel graph representation learning method for spatial graphs utilizing self-supervised learning on transformer models. It smoothly interpolates between attention-based global interaction between nodes and classic graph convolutional processing. We show that this method is able to yield morphological cell type clustering that is comparable to manual feature-based classification and shows a good correspondence to expert-labeled cell types in two different species and cortical areas. Our method is applicable beyond neuroscience in settings where samples in a dataset are graphs and graph-level embeddings are desired.

【4】 Nonnegative OPLS for Supervised Design of Filter Banks: Application to Image and Audio Feature Extraction 标题：过滤银行监督设计的非负最小二乘支持向量机在图像和音频特征提取中的应用链接：https://arxiv.org/abs/2112.12280

作者：Sergio Muñoz-Romero,Jerónimo Arenas García,Vanessa Gómez-Verdejo 机构： Universidad Polit´ecnica de Madrid 备注：None 摘要：音频或视频数据分析任务通常必须处理高维和非负信号。然而，大多数数据分析方法都会遇到过拟合和数值问题，当数据有多个维度需要进行降维预处理时。此外，关于滤波器如何以及为什么用于音频或视频应用的可解释性是一个需要的特性，特别是当涉及能量或光谱信号时。在这些情况下，由于这些信号的性质，滤波器权重的非负性是更好地理解其工作的一个期望属性。由于这两个必要性，我们提出了不同的方法来降低数据的维数，同时保证了解的非负性和可解释性。特别是，我们提出了一种广义方法，用于以有监督的方式设计用于处理非负数据的应用程序的滤波器组，并探讨了解决由正交归一化偏最小二乘法的非负版本组成的拟议目标函数的不同方法。我们分析了两种不同且被广泛研究的应用：纹理和音乐类型分类，使用所提出的方法获得的特征的辨别能力。此外，我们将我们的方法实现的滤波器组与其他专门为特征提取设计的最新方法进行了比较。摘要：Audio or visual data analysis tasks usually have to deal with high-dimensional and nonnegative signals. However, most data analysis methods suffer from overfitting and numerical problems when data have more than a few dimensions needing a dimensionality reduction preprocessing. Moreover, interpretability about how and why filters work for audio or visual applications is a desired property, especially when energy or spectral signals are involved. In these cases, due to the nature of these signals, the nonnegativity of the filter weights is a desired property to better understand its working. Because of these two necessities, we propose different methods to reduce the dimensionality of data while the nonnegativity and interpretability of the solution are assured. In particular, we propose a generalized methodology to design filter banks in a supervised way for applications dealing with nonnegative data, and we explore different ways of solving the proposed objective function consisting of a nonnegative version of the orthonormalized partial least-squares method. We analyze the discriminative power of the features obtained with the proposed methods for two different and widely studied applications: texture and music genre classification. Furthermore, we compare the filter banks achieved by our methods with other state-of-the-art methods specifically designed for feature extraction.

迁移|Zero/Few/One-Shot|自适应(3篇)

【1】 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Naïve 标题：基于3D骨架的珍妮Few-Shot动作识别并不那么幼稚链接：https://arxiv.org/abs/2112.12668

作者：Lei Wang,Jun Liu,Piotr Koniusz 机构：†The Australian National University, ♠Singapore University of Technology and Design, §Data,CSIRO 备注：Full 17 page version 摘要：在这篇文章中，我们提出了一种通过联合时间和相机视点对齐（JEANIE）的三维骨架动作识别的镜头学习管道。要计算三维车身关节的查询序列和支撑序列之间的偏差，我们提出了一种改进的动态时间扭曲方法，该方法联合建模查询帧和支持帧之间的每条平滑路径，在有限的少量镜头训练数据下，同时在时间和模拟摄像机视点空间中实现端到端学习的最佳对齐。序列用基于简单谱图卷积的时间块编码器编码，这是一种轻量级线性图神经网络主干（我们还包括一个带有Transformer的设置）。最后，我们提出了一种基于相似性的丢失方法，它鼓励同类序列的对齐，同时防止不相关序列的对齐。我们展示了NTU-60、NTU-120、动力学骨架和UWA3D多视图活动II的最新结果。摘要：In this paper, we propose a Few-shot Learning pipeline for 3D skeleton-based action recognition by Joint tEmporal and cAmera viewpoiNt alIgnmEnt (JEANIE). To factor out misalignment between query and support sequences of 3D body joints, we propose an advanced variant of Dynamic Time Warping which jointly models each smooth path between the query and support frames to achieve simultaneously the best alignment in the temporal and simulated camera viewpoint spaces for end-to-end learning under the limited few-shot training data. Sequences are encoded with a temporal block encoder based on Simple Spectral Graph Convolution, a lightweight linear Graph Neural Network backbone (we also include a setting with a transformer). Finally, we propose a similarity-based loss which encourages the alignment of sequences of the same class while preventing the alignment of unrelated sequences. We demonstrate state-of-the-art results on NTU-60, NTU-120, Kinetics-skeleton and UWA3D Multiview Activity II.

【2】 A Practical Data-Free Approach to One-shot Federated Learning with Heterogeneity 标题：一种实用的异构一次联邦学习无数据方法链接：https://arxiv.org/abs/2112.12371

作者：Jie Zhang,Chen Chen,Bo Li,Lingjuan Lyu,Shuang Wu,Jianghe Xu,Shouhong Ding,Chao Wu 机构： Zhejiang University, Tencent Youtu Lab, Sony AI 摘要：一次性联合学习（FL）最近成为一种很有前途的方法，它允许中央服务器在一轮通信中学习模型。尽管通信成本较低，但现有的一次性FL方法大多不切实际或面临固有的限制，例如，需要公共数据集，客户的模型是同质的，需要上传额外的数据/模型信息。为了克服这些问题，我们提出了一种更实用的无数据方法FedSyn，用于具有异构性的一次性FL框架。我们的FedSyn通过数据生成阶段和模型提取阶段来训练全局模型。据我们所知，FedSyn是第一种可以实际应用于各种实际应用的方法，因为它具有以下优点：（1）FedSyn不需要在客户端和服务器之间传输额外的信息（模型参数除外）；（2） FedSyn不需要任何辅助数据集进行训练；（3）FEDSYN是第一个考虑模型和统计异质性的FL，即客户端数据是非IID，不同的客户端可能有不同的模型体系结构。在各种真实数据集上的实验证明了我们的FedSyn的优越性。例如，当数据为非iid时，FedSyn在CIFAR10数据集上的性能比ADI提供的最佳基线方法高5.08%。摘要：One-shot Federated Learning (FL) has recently emerged as a promising approach, which allows the central server to learn a model in a single communication round. Despite the low communication cost, existing one-shot FL methods are mostly impractical or face inherent limitations, e.g., a public dataset is required, clients' models are homogeneous, need to upload additional data/model information. To overcome these issues, we propose a more practical data-free approach named FedSyn for one-shot FL framework with heterogeneity. Our FedSyn trains the global model by a data generation stage and a model distillation stage. To the best of our knowledge, FedSyn is the first method that can be practically applied to various real-world applications due to the following advantages: (1) FedSyn requires no additional information (except the model parameters) to be transferred between clients and the server; (2) FedSyn does not require any auxiliary dataset for training; (3) FedSyn is the first to consider both model and statistical heterogeneities in FL, i.e., the clients' data are non-iid and different clients may have different model architectures. Experiments on a variety of real-world datasets demonstrate the superiority of our FedSyn. For example, FedSyn outperforms the best baseline method Fed-ADI by 5.08% on CIFAR10 dataset when data are non-iid.

【3】 Combinations of Adaptive Filters 标题：自适应滤波器的组合链接：https://arxiv.org/abs/2112.12245

作者：Jerónimo Arenas-García,Luis A. Azpicueta-Ruiz,Magno T. M. Silva,Vitor H. Nascimento,Ali H. Sayed 备注：None 摘要：自适应滤波器是许多信号处理应用的核心，从噪声抑制到回波抵消、阵列波束形成、信道均衡，到最近传感器网络在监视、目标定位和跟踪中的应用。在这个方向上的一种趋势方法是重复使用网络内分布式处理，其中单个节点实现自适应规则并将其估计扩散到网络。当过滤场景的先验知识有限或不精确时，选择最合适的过滤器结构并调整其参数将成为一项具有挑战性的任务，错误的选择可能导致性能不足。为了解决这个困难，一个有用的方法是依靠自适应结构的组合。自适应滤波器的组合在某种程度上利用了同样的分而治之原则，机器学习社区也成功地利用了这一原则（例如，在bagging或boosting中）。特别是，在计算学习领域，人们从不同的角度研究了几种学习算法（专家混合）的输出组合问题：与其研究混合的预期性能，不如导出适用于单个序列的确定性边界，因此，反映最坏情况。这些边界需要不同于自适应滤波中通常使用的假设，这是本文概述的重点。我们回顾了这些组合方案背后的关键思想和原则，重点是设计规则。我们还通过各种示例来说明它们的性能。摘要：Adaptive filters are at the core of many signal processing applications, ranging from acoustic noise supression to echo cancelation, array beamforming, channel equalization, to more recent sensor network applications in surveillance, target localization, and tracking. A trending approach in this direction is to recur to in-network distributed processing in which individual nodes implement adaptation rules and diffuse their estimation to the network. When the a priori knowledge about the filtering scenario is limited or imprecise, selecting the most adequate filter structure and adjusting its parameters becomes a challenging task, and erroneous choices can lead to inadequate performance. To address this difficulty, one useful approach is to rely on combinations of adaptive structures. The combination of adaptive filters exploits to some extent the same divide and conquer principle that has also been successfully exploited by the machine-learning community (e.g., in bagging or boosting). In particular, the problem of combining the outputs of several learning algorithms (mixture of experts) has been studied in the computational learning field under a different perspective: rather than studying the expected performance of the mixture, deterministic bounds are derived that apply to individual sequences and, therefore, reflect worst-case scenarios. These bounds require assumptions different from the ones typically used in adaptive filtering, which is the emphasis of this overview article. We review the key ideas and principles behind these combination schemes, with emphasis on design rules. We also illustrate their performance with a variety of examples.

强化学习(8篇)

【1】 A deep reinforcement learning model for predictive maintenance planning of road assets: Integrating LCA and LCCA 标题：公路资产预见性养护计划的深度强化学习模型：LCA和LCCA的集成链接：https://arxiv.org/abs/2112.12589

作者：Fateme Golivand Darvishvand,Moen Latifi 机构：†These authors contributed equally to this work 摘要：道路养护规划是道路资产管理的一个组成部分。维护和修复（M&R）实践中的主要挑战之一是确定维护类型和时间。本研究提出了一个基于长期路面性能（LTPP）数据库的强化学习（RL）框架，以确定M&R实践的类型和时间。该算法首先建立了预测DNN模型，作为RL算法的环境。对于RL模型的政策估计，开发了DQN和PPO模型。然而，由于更好的收敛性和更高的样本效率，最终选择了PPO。本研究中使用的指标为国际粗糙度指数（IRI）和车辙深度（RD）。最初，我们将开裂指标（CM）视为第三个指标，但由于与其他指标相比数据少得多，因此被排除在外，这导致结果的准确性较低。此外，在成本效益计算（报酬）中，我们考虑了M&R处理的经济和环境影响。已使用Plate 2.0软件对成本和环境影响进行了评估。我们的方法在一个假设的案例研究中得到验证，该案例研究位于德克萨斯州，该州气候温暖潮湿，全长23公里。研究结果提出了一个20年的M&R计划，其中道路条件保持在良好的条件范围内。由于道路的早期状态处于良好的服务水平，因此无需在最初几年进行大规模维护。后来，在大规模的M&R行动之后，有几年的时间不需要治疗。所有这些都表明，拟议的计划具有合乎逻辑的结果。决策者和运输机构可以使用该方案进行更好的维护实践，以防止预算浪费，同时将环境影响降至最低。摘要：Road maintenance planning is an integral part of road asset management. One of the main challenges in Maintenance and Rehabilitation (M&R) practices is to determine maintenance type and timing. This research proposes a framework using Reinforcement Learning (RL) based on the Long Term Pavement Performance (LTPP) database to determine the type and timing of M&R practices. A predictive DNN model is first developed in the proposed algorithm, which serves as the Environment for the RL algorithm. For the Policy estimation of the RL model, both DQN and PPO models are developed. However, PPO has been selected in the end due to better convergence and higher sample efficiency. Indicators used in this study are International Roughness Index (IRI) and Rutting Depth (RD). Initially, we considered Cracking Metric (CM) as the third indicator, but it was then excluded due to the much fewer data compared to other indicators, which resulted in lower accuracy of the results. Furthermore, in cost-effectiveness calculation (reward), we considered both the economic and environmental impacts of M&R treatments. Costs and environmental impacts have been evaluated with paLATE 2.0 software. Our method is tested on a hypothetical case study of a six-lane highway with 23 kilometers length located in Texas, which has a warm and wet climate. The results propose a 20-year M&R plan in which road condition remains in an excellent condition range. Because the early state of the road is at a good level of service, there is no need for heavy maintenance practices in the first years. Later, after heavy M&R actions, there are several 1-2 years of no need for treatments. All of these show that the proposed plan has a logical result. Decision-makers and transportation agencies can use this scheme to conduct better maintenance practices that can prevent budget waste and, at the same time, minimize the environmental impacts.

【2】 Newsvendor Model with Deep Reinforcement Learning 标题：基于深度强化学习的报贩模型链接：https://arxiv.org/abs/2112.12544

作者：Dylan K. Goetting 备注：10 pages with 4 figures 摘要：我提出了一个深度强化学习（RL）的数学问题解决方案，称为报童模型，该模型寻求在给定概率需求分布的情况下优化利润。为了反映更现实、更复杂的情况，需求分布可以在一周中的不同日子发生变化，从而改变最佳行为。我使用了一个双延迟的深层确定性策略梯度代理（完全以原始代码编写）和一个参与者和评论家网络来解决这个问题。代理能够学习与问题解析解一致的最优行为，并且能够识别一周中不同天数的单独概率分布，并相应地进行行为。摘要：I present a deep reinforcement learning (RL) solution to the mathematical problem known as the Newsvendor model, which seeks to optimize profit given a probabilistic demand distribution. To reflect a more realistic and complex situation, the demand distribution can change for different days of the week, thus changing the optimum behavior. I used a Twin-Delayed Deep Deterministic Policy Gradient agent (written as completely original code) with both an actor and critic network to solve this problem. The agent was able to learn optimal behavior consistent with the analytical solution of the problem, and could identify separate probability distributions for different days of the week and behave accordingly.

【3】 The Impact of Missing Velocity Information in Dynamic Obstacle Avoidance based on Deep Reinforcement Learning 标题：基于深度强化学习的速度信息丢失对动态避障的影响链接：https://arxiv.org/abs/2112.12465

作者：Fabian Hart,Martin Waltz,Ostap Okhrin 机构：Institute of Transportation Economics, Technische Universit¨at Dresden, Germany 摘要：提出了一种基于深度强化学习的动态避障方法，定义了一个与交通类型无关、复杂度可变的环境。填补了当前文献中的空白，我们深入研究了速度信息缺失对智能体在避障任务中性能的影响。这在实践中是一个关键问题，因为多个传感器只产生物体或车辆的位置信息。我们评估了部分可观测性场景中常用的方法，即深度神经网络中的重现性合并和简单的帧叠加。对于我们的分析，我们依赖于最先进的无模型深度RL算法。发现缺乏速度信息会显著影响代理的性能。这两种方法——重现和帧叠加——都不能始终替换观测空间中丢失的速度信息。但是，在简化的场景中，它们可以显著提高性能并稳定整个训练过程。摘要：We introduce a novel approach to dynamic obstacle avoidance based on Deep Reinforcement Learning by defining a traffic type independent environment with variable complexity. Filling a gap in the current literature, we thoroughly investigate the effect of missing velocity information on an agent's performance in obstacle avoidance tasks. This is a crucial issue in practice since several sensors yield only positional information of objects or vehicles. We evaluate frequently-applied approaches in scenarios of partial observability, namely the incorporation of recurrency in the deep neural networks and simple frame-stacking. For our analysis, we rely on state-of-the-art model-free deep RL algorithms. The lack of velocity information is found to significantly impact the performance of an agent. Both approaches - recurrency and frame-stacking - cannot consistently replace missing velocity information in the observation space. However, in simplified scenarios, they can significantly boost performance and stabilize the overall training procedure.

【4】 Local Advantage Networks for Cooperative Multi-Agent Reinforcement Learning 标题：基于局部优势网络的协作式多智能体强化学习链接：https://arxiv.org/abs/2112.12458

作者：Raphaël Avalos,Mathieu Reymond,Ann Nowé,Diederik M. Roijers 机构：Vrije Universiteit Brussel, HU Univ. of Appl. Sci. Utrecht 摘要：多智能体强化学习（MARL）使我们能够在具有挑战性的环境中创建自适应智能体，即使这些智能体的观察能力有限。到目前为止，现代马尔方法的重点是寻找因式分解的值函数。虽然这种方法已被证明是成功的，但由此产生的方法具有复杂的网络结构。我们采取了一种完全不同的方法，建立在独立Q-学习者的结构之上。受基于影响的抽象的启发，我们从观察开始，观察动作历史的紧凑表示足以学习接近最优分散策略。将这一观察结果与决斗体系结构相结合，我们的算法LAN将这些策略表示为单独的个人优势功能w.r.t.一个集中的批评家。这些局部优势网络仅以单个代理的局部观察动作历史为条件。集中的值函数条件取决于代理的表示以及环境的完整状态。值函数在执行之前被丢弃，作为一个稳定器，协调学习并在学习期间制定DQN目标。与其他方法相比，这使LAN能够保持其集中网络的网络参数数量与代理数量无关，而不会施加诸如单调值函数之类的附加约束。当在星际争霸多代理挑战基准上进行评估时，LAN表现出最先进的性能，在两张之前未解决的地图“走廊”和“3s5z_vs_3s6z”中得分超过80%，从而在14张地图上的平均性能比QPLEX提高了10%。此外，当代理数量变大时，LAN使用的参数明显少于QPLEX甚至QMIX。因此，我们表明，局域网的结构形成了一个关键的改进，有助于MARL方法保持可扩展性。摘要：Multi-agent reinforcement learning (MARL) enables us to create adaptive agents in challenging environments, even when the agents have limited observation. Modern MARL methods have hitherto focused on finding factorized value functions. While this approach has proven successful, the resulting methods have convoluted network structures. We take a radically different approach, and build on the structure of independent Q-learners. Inspired by influence-based abstraction, we start from the observation that compact representations of the observation-action histories can be sufficient to learn close to optimal decentralized policies. Combining this observation with a dueling architecture, our algorithm, LAN, represents these policies as separate individual advantage functions w.r.t. a centralized critic. These local advantage networks condition only on a single agent's local observation-action history. The centralized value function conditions on the agents' representations as well as the full state of the environment. The value function, which is cast aside before execution, serves as a stabilizer that coordinates the learning and to formulate DQN targets during learning. In contrast with other methods, this enables LAN to keep the number of network parameters of its centralized network independent in the number of agents, without imposing additional constraints like monotonic value functions. When evaluated on the StarCraft multi-agent challenge benchmark, LAN shows state-of-the-art performance and scores more than 80% wins in two previously unsolved maps `corridor' and `3s5z_vs_3s6z', leading to an improvement of 10% over QPLEX on average performance on the 14 maps. Moreover when the number of agents becomes large, LAN uses significantly fewer parameters than QPLEX or even QMIX. We thus show that LAN's structure forms a key improvement that helps MARL methods remain scalable.

【5】 Safety and Liveness Guarantees through Reach-Avoid Reinforcement Learning 标题：通过REACH-AUSE强化学习保证安全性和活动性链接：https://arxiv.org/abs/2112.12288

作者：Kai-Chieh Hsu,Vicenç Rubies-Royo,Claire J. Tomlin,Jaime F. Fisac 机构：∗Department of Electrical and Computer Engineering, Princeton University, United States, †Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, United States 备注：Accepted in Robotics: Science and Systems (RSS), 2021 摘要：Reach避免最优控制问题，即系统必须达到特定的目标条件，同时避免不可接受的故障模式，是自治机器人系统安全性和活性保证的核心，但对于复杂的动力学和环境，其精确解是难以解决的。最近，强化学习方法在近似解决具有性能目标的最优控制问题方面取得的成功使其在认证问题中的应用具有吸引力；然而，强化学习中使用的拉格朗日型目标不适合编码时态逻辑需求。最近的工作显示了将强化学习机制扩展到安全型问题的希望，安全型问题的目标不是总和，而是随时间推移的最小值（或最大值）。在这项工作中，我们推广了强化学习公式来处理到达-避免范畴中的所有最优控制问题。我们推导了一个具有收缩映射性质的时间折扣到达避免Bellman备份，并证明了由此产生的到达避免Q-学习算法在类似于传统Lagrange型问题的条件下收敛，从而得到到达避免集的任意紧保守近似。我们进一步展示了该公式与深度强化学习方法的使用，通过在模型预测监控框架中将近似解视为不可信预言来保持零违反保证。我们在一系列非线性系统上评估了我们提出的框架，通过解析解和数值解验证了结果，并通过对以前棘手问题的蒙特卡罗模拟验证了结果。我们的研究结果开启了一系列基于学习的安全和实时自主行为方法的大门，应用于机器人技术和自动化领域。看见https://github.com/SafeRoboticsLab/safety_rl代码和补充资料。摘要：Reach-avoid optimal control problems, in which the system must reach certain goal conditions while staying clear of unacceptable failure modes, are central to safety and liveness assurance for autonomous robotic systems, but their exact solutions are intractable for complex dynamics and environments. Recent successes in reinforcement learning methods to approximately solve optimal control problems with performance objectives make their application to certification problems attractive; however, the Lagrange-type objective used in reinforcement learning is not suitable to encode temporal logic requirements. Recent work has shown promise in extending the reinforcement learning machinery to safety-type problems, whose objective is not a sum, but a minimum (or maximum) over time. In this work, we generalize the reinforcement learning formulation to handle all optimal control problems in the reach-avoid category. We derive a time-discounted reach-avoid Bellman backup with contraction mapping properties and prove that the resulting reach-avoid Q-learning algorithm converges under analogous conditions to the traditional Lagrange-type problem, yielding an arbitrarily tight conservative approximation to the reach-avoid set. We further demonstrate the use of this formulation with deep reinforcement learning methods, retaining zero-violation guarantees by treating the approximate solutions as untrusted oracles in a model-predictive supervisory control framework. We evaluate our proposed framework on a range of nonlinear systems, validating the results against analytic and numerical solutions, and through Monte Carlo simulation in previously intractable problems. Our results open the door to a range of learning-based methods for safe-and-live autonomous behavior, with applications across robotics and automation. See https://github.com/SafeRoboticsLab/safety_rl for code and supplementary material.

【6】 Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions 标题：考虑过去决策提高非策略强化学习的效率链接：https://arxiv.org/abs/2112.12281

作者：Brett Daley,Christopher Amato 机构：Khoury College of Computer Sciences, Northeastern University, Boston, MA , USA 备注：12 pages, 0 figures 摘要：从多步返回的非策略学习对于样本有效强化学习至关重要，特别是在目前常用于深度神经网络的经验重放设置中。经典地，非策略估计偏差以每决策的方式进行纠正：在每次操作后，通过瞬时重要性抽样（is）比率（通过合格性跟踪）重新加权过去的时间差错误。许多重要的非策略算法（如树备份和回溯）都依赖于这种机制以及不同的协议来截断（“切割”）比率（“跟踪”），以抵消IS估计器的过度方差。不幸的是，在每个决策的基础上切割痕迹并不一定有效；一旦根据本地信息切断了跟踪，这种影响就无法在以后逆转，可能会导致估计收益的过早截断和较慢的学习。为了激发有效的非策略算法，我们提出了一种允许任意过去相关跟踪的多步算子。我们证明了我们的算子对于策略评估是收敛的，对于在极限策略中以贪婪为目标的最优控制也是收敛的。我们的定理为许多现有算法建立了第一收敛保证，包括截断IS、非马尔可夫回溯和历史相关TD（$\lambda$）。我们的理论结果还提供了指导开发新的算法，联合考虑多个过去的决策更好的信贷分配和更快的学习。摘要：Off-policy learning from multistep returns is crucial for sample-efficient reinforcement learning, particularly in the experience replay setting now commonly used with deep neural networks. Classically, off-policy estimation bias is corrected in a per-decision manner: past temporal-difference errors are re-weighted by the instantaneous Importance Sampling (IS) ratio (via eligibility traces) after each action. Many important off-policy algorithms such as Tree Backup and Retrace rely on this mechanism along with differing protocols for truncating ("cutting") the ratios ("traces") to counteract the excessive variance of the IS estimator. Unfortunately, cutting traces on a per-decision basis is not necessarily efficient; once a trace has been cut according to local information, the effect cannot be reversed later, potentially resulting in the premature truncation of estimated returns and slower learning. In the interest of motivating efficient off-policy algorithms, we propose a multistep operator that permits arbitrary past-dependent traces. We prove that our operator is convergent for policy evaluation, and for optimal control when targeting greedy-in-the-limit policies. Our theorems establish the first convergence guarantees for many existing algorithms including Truncated IS, Non-Markov Retrace, and history-dependent TD($\lambda$). Our theoretical results also provide guidance for the development of new algorithms that jointly consider multiple past decisions for better credit assignment and faster learning.

【7】 Direct Behavior Specification via Constrained Reinforcement Learning 标题：基于约束强化学习的直接行为规范链接：https://arxiv.org/abs/2112.12228

作者：Julien Roy,Roger Girgis,Joshua Romoff,Pierre-Luc Bacon,Christopher Pal 机构：†Ubisoft La Forge,Institut d’intelligence aritficielle du Qu´ebec (Mila),Polytechnique Montr´eal,Universit´e de Montr´eal, Facebook CIFAR AI Chair,Canada CIFAR AI Chair 摘要：强化学习的标准公式缺乏一种实用的方法来指定哪些行为是允许的和禁止的。大多数情况下，实践者通过手动设计奖励函数来完成行为规范的任务，这是一个反直觉的过程，需要多次迭代，并且容易受到代理的奖励黑客攻击。在这项工作中，我们认为，几乎完全用于安全RL的约束RL，也有可能显著减少应用强化学习项目中用于奖励规范的工作量。为此，我们建议在CMDP框架中指定行为偏好，并使用拉格朗日方法（寻求解决代理策略和拉格朗日乘数之间的最小-最大问题）自动权衡每个行为约束。具体而言，我们研究了如何调整CMDP以解决基于目标的任务，同时遵守一组行为约束，并对SAC Lagrangian算法提出了修改，以处理多个约束的挑战性情况。我们在一组与强化学习在视频游戏NPC设计中的应用相关的连续控制任务上评估了该框架。摘要：The standard formulation of Reinforcement Learning lacks a practical way of specifying what are admissible and forbidden behaviors. Most often, practitioners go about the task of behavior specification by manually engineering the reward function, a counter-intuitive process that requires several iterations and is prone to reward hacking by the agent. In this work, we argue that constrained RL, which has almost exclusively been used for safe RL, also has the potential to significantly reduce the amount of work spent for reward specification in applied Reinforcement Learning projects. To this end, we propose to specify behavioral preferences in the CMDP framework and to use Lagrangian methods, which seek to solve a min-max problem between the agent's policy and the Lagrangian multipliers, to automatically weigh each of the behavioral constraints. Specifically, we investigate how CMDPs can be adapted in order to solve goal-based tasks while adhering to a set of behavioral constraints and propose modifications to the SAC-Lagrangian algorithm to handle the challenging case of several constraints. We evaluate this framework on a set of continuous control tasks relevant to the application of Reinforcement Learning for NPC design in video games.

【8】 A Deep Reinforcement Learning Approach for Solving the Traveling Salesman Problem with Drone 标题：一种求解无人机旅行商问题的深度强化学习方法链接：https://arxiv.org/abs/2112.12545

作者：Aigerim Bogyrbayeva. Taehyun Yoon,Hanbum Ko,Sungbin Lim,Hyokun Yun,Changhyun Kwon 机构：Suleyman Demirel University, Kazakhstan, UNIST, South Korea, Amazon, U.S.A., University of South Florida, U.S.A., KAIST, South Korea 摘要：最近，强化学习在许多组合优化问题中显示出学习高质量解的前景。特别是，基于注意的编码器-解码器模型对各种路由问题，包括旅行商问题（TSP）表现出很高的有效性。不幸的是，它们在有无人机的TSP（TSP-D）中的性能很差，需要协调地路由不同的车队——卡车和无人机。在TSP-D中，两辆车串联移动，可能需要在节点处等待另一辆车加入。基于状态较少注意的解码器无法在车辆之间进行这种协调。我们提出了一种注意编码器-LSTM-解码器混合模型，其中解码器的隐藏状态可以表示动作序列。我们的经验表明，这种混合模型在解决方案质量和计算效率方面都优于纯粹基于注意力的模型。我们在最小-最大容量约束车辆路径问题（mmCVRP）上的实验也证实了混合模型比基于注意的模型更适合于多车辆的协调路径问题。摘要：Reinforcement learning has recently shown promise in learning quality solutions in many combinatorial optimization problems. In particular, the attention-based encoder-decoder models show high effectiveness on various routing problems, including the Traveling Salesman Problem (TSP). Unfortunately, they perform poorly for the TSP with Drone (TSP-D), requiring routing a heterogeneous fleet of vehicles in coordination -- a truck and a drone. In TSP-D, the two vehicles are moving in tandem and may need to wait at a node for the other vehicle to join. State-less attention-based decoder fails to make such coordination between vehicles. We propose an attention encoder-LSTM decoder hybrid model, in which the decoder's hidden state can represent the sequence of actions made. We empirically demonstrate that such a hybrid model improves upon a purely attention-based model for both solution quality and computational efficiency. Our experiments on the min-max Capacitated Vehicle Routing Problem (mmCVRP) also confirm that the hybrid model is more suitable for coordinated routing of multiple vehicles than the attention-based model.

医学相关(4篇)

【1】 INTRPRT: A Systematic Review of and Guidelines for Designing and Validating Transparent AI in Medical Image Analysis 标题：INTRPRT：设计和验证医学图像分析中透明人工智能的系统评价和指南链接：https://arxiv.org/abs/2112.12596

作者：Haomin Chen,Catalina Gomez,Chien-Ming Huang,Mathias Unberath 机构：Department of Computer Science, Johns Hopkins University 摘要：机器学习的透明度（ML），试图揭示复杂模型的工作机制。透明ML承诺在目标用户中推进以人为中心的人工智能的人为因素工程目标。从以人为本的设计角度来看，透明度不是ML模型的属性，而是一种启示，即算法和用户之间的关系；因此，与用户进行迭代原型设计和评估对于获得足够的解决方案、提供透明度至关重要。然而，由于终端用户的可用性和访问权限有限，在医疗保健和医学图像分析中遵循以人为本的设计原则具有挑战性。为了研究医学图像分析中透明ML的状态，我们对文献进行了系统的回顾。我们的综述揭示了用于医学图像分析应用的透明ML在设计和验证方面的多个严重缺陷。我们发现，迄今为止的大多数研究都将透明度作为模型本身的一个属性，类似于任务性能，在开发和评估过程中都没有考虑最终用户。此外，缺乏用户研究，以及透明性声明的零星验证，使得用于医学图像分析的透明ML的当代研究面临用户无法理解的风险，因此与临床无关。为了缓解即将进行的研究中的这些缺点，同时承认医疗保健中以人为中心的设计的挑战，我们引入了INTRPRT指南，这是一项针对医学图像分析中透明ML系统的系统设计指令。INTRPRT指南建议将形成性用户研究作为透明模型设计的第一步，以了解用户需求和领域需求。遵循这一过程产生了支持设计选择的证据，并最终增加了算法提供透明度的可能性。摘要：Transparency in Machine Learning (ML), attempts to reveal the working mechanisms of complex models. Transparent ML promises to advance human factors engineering goals of human-centered AI in the target users. From a human-centered design perspective, transparency is not a property of the ML model but an affordance, i.e. a relationship between algorithm and user; as a result, iterative prototyping and evaluation with users is critical to attaining adequate solutions that afford transparency. However, following human-centered design principles in healthcare and medical image analysis is challenging due to the limited availability of and access to end users. To investigate the state of transparent ML in medical image analysis, we conducted a systematic review of the literature. Our review reveals multiple severe shortcomings in the design and validation of transparent ML for medical image analysis applications. We find that most studies to date approach transparency as a property of the model itself, similar to task performance, without considering end users during neither development nor evaluation. Additionally, the lack of user research, and the sporadic validation of transparency claims put contemporary research on transparent ML for medical image analysis at risk of being incomprehensible to users, and thus, clinically irrelevant. To alleviate these shortcomings in forthcoming research while acknowledging the challenges of human-centered design in healthcare, we introduce the INTRPRT guideline, a systematic design directive for transparent ML systems in medical image analysis. The INTRPRT guideline suggests formative user research as the first step of transparent model design to understand user needs and domain requirements. Following this process produces evidence to support design choices, and ultimately, increases the likelihood that the algorithms afford transparency.

【2】 Analysis of ECG data to detect Atrial Fibrillation 标题：心电图数据分析在心房颤动检测中的应用链接：https://arxiv.org/abs/2112.12298

作者：Arjun Sridharkumar,Sai Bhargav,Rahul Guntha 机构：Dept of Computer Science, University of Alberta 摘要：心房颤动（今后称为AF/Afib）是一种离散且通常快速的心律，可导致心脏附近的血栓。如图（1）所示，我们可以通过ECG信号检测Afib，即p波缺失和R波间隔不一致。现有的方法围绕着用于检测afib的CNN，但大多数方法都使用12点导联ECG数据，在我们的案例中，health gauge watch处理单点ECG数据。十二导联心电图数据比单导联数据更准确。此外，health gauge手表的数据噪音更大。实现一个模型来检测手表的Afib是对CNN如何改变/修改以处理真实数据的测试摘要：Atrial fibrillation(termed as AF/Afib henceforth) is a discrete and often rapid heart rhythm that can lead to clots near the heart. We can detect Afib by ECG signal by the absence of p and inconsistent intervals between R waves as shown in fig(1). Existing methods revolve around CNN that are used to detect afib but most of them work with 12 point lead ECG data where in our case the health gauge watch deals with single-point ECG data. Twelve-point lead ECG data is more accurate than a single point. Furthermore, the health gauge watch data is much noisier. Implementing a model to detect Afib for the watch is a test of how the CNN is changed/modified to work with real life data

【3】 Maximum Entropy on Erroneous Predictions (MEEP): Improving model calibration for medical image segmentation 标题：错误预测的最大熵(MEEP)：改进的医学图像分割模型校正链接：https://arxiv.org/abs/2112.12218

作者：Agostina Larrazabal,Cesar Martinez,Jose Dolz,Enzo Ferrante 机构： CONICET, Universidad Nacional del Litoral, Argentina, ETS Montreal, Canada 摘要：现代深度神经网络在医学图像分割方面取得了显著的进展。然而，最近有人观察到，即使在高度不确定性的情况下，它们也会产生过度自信的估计，导致校准不良和不可靠的模型。在这项工作中，我们引入了错误预测最大熵（MEEP），这是一种用于分割网络的训练策略，它选择性地惩罚过度自信的预测，只关注错误分类的像素。特别是，我们设计了一个正则化项，鼓励错误预测的高熵后验，增加了复杂场景中的网络不确定性。我们的方法与神经结构无关，不增加模型复杂度，并且可以与多个分割损失函数耦合。我们在两个具有挑战性的医学图像分割任务中对所提出的策略进行了基准测试：大脑磁共振图像（MRI）中的白质高强度病变，以及心脏MRI中的心房分割。实验结果表明，将MEEP与标准分割损耗相结合，不仅可以提高模型的标定精度，而且可以提高分割质量。摘要：Modern deep neural networks have achieved remarkable progress in medical image segmentation tasks. However, it has recently been observed that they tend to produce overconfident estimates, even in situations of high uncertainty, leading to poorly calibrated and unreliable models. In this work we introduce Maximum Entropy on Erroneous Predictions (MEEP), a training strategy for segmentation networks which selectively penalizes overconfident predictions, focusing only on misclassified pixels. In particular, we design a regularization term that encourages high entropy posteriors for wrong predictions, increasing the network uncertainty in complex scenarios. Our method is agnostic to the neural architecture, does not increase model complexity and can be coupled with multiple segmentation loss functions. We benchmark the proposed strategy in two challenging medical image segmentation tasks: white matter hyperintensity lesions in magnetic resonance images (MRI) of the brain, and atrial segmentation in cardiac MRI. The experimental results demonstrate that coupling MEEP with standard segmentation losses leads to improvements not only in terms of model calibration, but also in segmentation quality.

【4】 Scalable Variational Quantum Circuits for Autoencoder-based Drug Discovery 标题：用于基于自动编码器的药物发现的可扩展变分量子电路链接：https://arxiv.org/abs/2112.12563

作者：Junde Li,Swaroop Ghosh 机构：Department of Computer Science and Engineering, The Pennsylvania State University 备注：Accepted at DATE 2022 摘要：药物分子的从头设计被认为是一个耗时且昂贵的过程，计算方法已应用于药物发现管道的每个阶段。变分自动编码器是一种基于现有分子数据集探索化学空间的计算机辅助设计方法。量子机器学习是一种非典型的学习方法，由于其强大的表达能力，它可以加速一些经典的学习任务。然而，短期量子计算机的量子比特数有限，阻碍了高维空间中的表征学习。我们提出了一种可伸缩的量子生成自动编码器（SQ-VAE），用于同时重建和采样药物分子，以及一种相应的香草变种（SQ-AE），用于更好地重建。提出了混合量子经典网络的体系结构策略，如可调量子层深度、异质学习速率和修补量子电路，以学习高维数据集，如配体靶向药物。在选择合适的架构策略后，针对不同尺寸（包括8x8和32x32）报告了大量的实验结果。在所有实验中，量子生成式自动编码器的性能都与相应的经典编码器进行了比较。结果表明，归一化低维分子可以实现量子计算优势，而由量子生成式自动编码器生成的高维分子在相同的学习周期内具有更好的药物性质。摘要：The de novo design of drug molecules is recognized as a time-consuming and costly process, and computational approaches have been applied in each stage of the drug discovery pipeline. Variational autoencoder is one of the computer-aided design methods which explores the chemical space based on existing molecular dataset. Quantum machine learning has emerged as an atypical learning method that may speed up some classical learning tasks because of its strong expressive power. However, near-term quantum computers suffer from limited number of qubits which hinders the representation learning in high dimensional spaces. We present a scalable quantum generative autoencoder (SQ-VAE) for simultaneously reconstructing and sampling drug molecules, and a corresponding vanilla variant (SQ-AE) for better reconstruction. The architectural strategies in hybrid quantum classical networks such as, adjustable quantum layer depth, heterogeneous learning rates, and patched quantum circuits are proposed to learn high dimensional dataset such as, ligand-targeted drugs. Extensive experimental results are reported for different dimensions including 8x8 and 32x32 after choosing suitable architectural strategies. The performance of quantum generative autoencoder is compared with the corresponding classical counterpart throughout all experiments. The results show that quantum computing advantages can be achieved for normalized low-dimension molecules, and that high-dimension molecules generated from quantum generative autoencoders have better drug properties within the same learning period.

蒸馏|知识提取(2篇)

【1】 Distilling the Knowledge of Romanian BERTs Using Multiple Teachers 标题：利用多名教师提炼罗马尼亚贝特斯知识链接：https://arxiv.org/abs/2112.12650

作者：Andrei-Marius Avram,Darius Catrina,Dumitru-Clementin Cercel,Mihai Dascălu,Traian Rebedea,Vasile Păiş,Dan Tufiş 机构：Research Institute for Artificial Intelligence, Romanian Academy, University Politehnica of Bucharest, Tudor Vianu National College of Computer Science, Bucharest, Romania 摘要：随着大规模预训练语言模型的迁移学习在自然语言处理中的流行，在计算受限环境中运行这些模型仍然是一个有待解决的挑战性问题。提出了知识提取、网络量化、网络剪枝等解决方案；然而，这些方法主要集中在英语上，因此在考虑低资源语言时会扩大差距。在这项工作中，我们介绍了罗马尼亚语言的三种简单快速的蒸馏BERT模型：Distil-BERT-base-ro、Distil-RoBERT-base和Distil-multi-BERT-base-ro。前两个模型是通过分别提取文献中两个基本版本的罗马尼亚BERTs的知识而得到的，而最后一个模型是通过提取其集合而得到的。据我们所知，这是首次尝试创建公开可用的罗马尼亚模式，该模式在五项任务中进行了全面评估：词性标注、命名实体识别、情感分析、语义-文本相似度和方言识别。在这些基准上的实验结果证明，我们的三个蒸馏模型在教师方面保持了最高的准确性，而在GPU上的速度是GPU的两倍，更小约35%。此外，我们还通过测量学生的标签忠诚度和概率忠诚度，以及回归忠诚度（这项工作中引入的一个新指标），进一步测试学生和教师预测之间的相似性。摘要：As transfer learning from large-scale pre-trained language models has become prevalent in Natural Language Processing, running these models in computationally constrained environments remains a challenging problem yet to address. Several solutions including knowledge distillation, network quantization or network pruning have been proposed; however, these approaches focus mostly on the English language, thus widening the gap when considering low-resource languages. In this work, we introduce three light and fast versions of distilled BERT models for the Romanian language: Distil-BERT-base-ro, Distil-RoBERT-base and DistilMulti-BERT-base-ro. The first two models resulted from individually distilling the knowledge of the two base versions of Romanian BERTs available in literature, while the last one was obtained by distilling their ensemble. To our knowledge, this is the first attempt to create publicly available Romanian distilled BERT models, which were thoroughly evaluated on five tasks: part-of-speech tagging, named entity recognition, sentiment analysis, semantic textual similarity and dialect identification. The experimental results on these benchmarks proved that our three distilled models maintain most performance in terms of accuracy with their teachers, while being twice as fast on a GPU and ~35\% smaller. In addition, we further test the similarity between our students and their teachers prediction by measuring their label and probability loyalty, together with regression loyalty - a new metric introduced in this work.

【2】 LAME: Layout Aware Metadata Extraction Approach for Research Articles 标题：LAME：面向科研论文的版面感知元数据提取方法链接：https://arxiv.org/abs/2112.12353

作者：Jongyun Choi,Hyesoo Kong,Hwamook Yoon,Heung-Seon Oh,Yuchul Jung 机构：Department of Computer Engineering, Kumoh National Institute of Technology (KIT), Gumi, South Korea, Korea Institute of Science and Technology Information (KISTI), South Korea 摘要：学术文献（如学术会议论文和期刊）的数量在世界范围内迅速增加，元数据提取的研究正在进行中。然而，根据期刊出版商的说法，由于版面格式的多样性，高性能元数据提取仍然具有挑战性。为了适应学术期刊版面的多样性，我们提出了一种新的版面感知元数据提取（LAME）框架，该框架具有三个特征（例如，自动版面分析的设计、大型元数据训练集的构建和版面元元数据库的构建）。我们使用PDFMiner设计了一个自动布局分析。基于布局分析，自动提取大量元数据分离的训练数据，包括标题、摘要、作者姓名、作者所属组织和关键字。此外，我们还构建了版面元数据库，从不同版面格式的学术期刊中提取元数据。使用Layout MetaBERT进行的实验结果显示，对于不同版面格式的未阅读期刊，元数据提取具有稳健的性能（Macro-F1，93.27%）。摘要：The volume of academic literature, such as academic conference papers and journals, has increased rapidly worldwide, and research on metadata extraction is ongoing. However, high-performing metadata extraction is still challenging due to diverse layout formats according to journal publishers. To accommodate the diversity of the layouts of academic journals, we propose a novel LAyout-aware Metadata Extraction (LAME) framework equipped with the three characteristics (e.g., design of an automatic layout analysis, construction of a large meta-data training set, and construction of Layout-MetaBERT). We designed an automatic layout analysis using PDFMiner. Based on the layout analysis, a large volume of metadata-separated training data, including the title, abstract, author name, author affiliated organization, and keywords, were automatically extracted. Moreover, we constructed Layout-MetaBERT to extract the metadata from academic journals with varying layout formats. The experimental results with Layout-MetaBERT exhibited robust performance (Macro-F1, 93.27%) in metadata extraction for unseen journals with different layout formats.

推荐(1篇)

【1】 Comprehensive Movie Recommendation System 标题：综合电影推荐系统链接：https://arxiv.org/abs/2112.12463

作者：Hrisav Bhowmick,Ananda Chatterjee,Jaydip Sen 机构：Dept. of Data Science, Praxis Business School, Kolkata, Country, Kolkata, India 备注：The paper was presented in the 8th International Conference on Business Analytics and Intelligence (ICBAI'21), December 20-22, 2021, Bangalore, India. This is the pre=print of the published version that appears in the conference proceedings. It is eight pages long, and it consists of nine tables 摘要：推荐系统，也称为推荐系统，是一种信息过滤系统，试图预测用户对某个项目的评分或偏好。本文设计并实现了一个完整的基于体裁、皮尔逊相关系数、余弦相似度、基于KNN、基于内容的TFIDF和SVD过滤、基于TFIDF和SVD的协同过滤、基于惊喜库的推荐系统技术的电影推荐系统原型。除此之外，本文还提出了一种新的思想，即应用机器学习技术构建基于类型的电影聚类，然后观察惯性值，定义聚类数。这项工作中讨论的方法的限制已经被描述，以及一种策略如何克服另一种策略的缺点。整个工作都是在group Lens网站上提供的数据集Movie Lens上完成的，该数据集包含100836个评级和3683个标签应用程序，涵盖9742部电影。这些数据由610名用户在1996年3月29日至2018年9月24日期间创建。摘要：A recommender system, also known as a recommendation system, is a type of information filtering system that attempts to forecast a user's rating or preference for an item. This article designs and implements a complete movie recommendation system prototype based on the Genre, Pearson Correlation Coefficient, Cosine Similarity, KNN-Based, Content-Based Filtering using TFIDF and SVD, Collaborative Filtering using TFIDF and SVD, Surprise Library based recommendation system technology. Apart from that in this paper, we present a novel idea that applies machine learning techniques to construct a cluster for the movie based on genres and then observes the inertia value number of clusters were defined. The constraints of the approaches discussed in this work have been described, as well as how one strategy overcomes the disadvantages of another. The whole work has been done on the dataset Movie Lens present at the group lens website which contains 100836 ratings and 3683 tag applications across 9742 movies. These data were created by 610 users between March 29, 1996, and September 24, 2018.

聚类(1篇)

【1】 Attentive Multi-View Deep Subspace Clustering Net 标题：关注的多视点深子空间聚类网络链接：https://arxiv.org/abs/2112.12506

作者：Run-kun Lu,Jian-wei Liu,Xin Zuo 机构：Department of Automation, College of Information Science and Engineering, China University of Petroleum, Beijing, Mailbox, Changping District, Beijing 摘要：在本文中，我们提出了一种新的注意多视角深子空间网（AMVDSN），它深入挖掘多视角的潜在一致性和视角特定信息，并通过考虑每个视角通过注意机制获得的动态贡献来融合这些信息。不同于大多数多视点子空间学习方法，它们直接在原始数据上重建数据点，或者只考虑在深或浅空间中学习表示时的一致性或互补性；我们提出的方法试图找到一个联合潜在表示，该表示明确地考虑了多个视图之间的一致性和视图特定信息，然后对学习到的联合潜在表示执行子空间聚类。此外，不同的视图对表示学习的贡献不同，因此我们引入注意机制来获取每个视图的动态权重，这在多视图子空间聚类领域比以前的融合方法表现得更好。与传统的子空间聚类方法相比，该算法直观，只需使用随机梯度下降法（SGD）即可轻松优化，因为神经网络框架具有很强的非线性表征能力。在七个真实数据集上的实验结果证明了我们提出的算法对一些最先进的子空间学习方法的有效性。摘要：In this paper, we propose a novel Attentive Multi-View Deep Subspace Nets (AMVDSN), which deeply explores underlying consistent and view-specific information from multiple views and fuse them by considering each view's dynamic contribution obtained by attention mechanism. Unlike most multi-view subspace learning methods that they directly reconstruct data points on raw data or only consider consistency or complementarity when learning representation in deep or shallow space, our proposed method seeks to find a joint latent representation that explicitly considers both consensus and view-specific information among multiple views, and then performs subspace clustering on learned joint latent representation.Besides, different views contribute differently to representation learning, we therefore introduce attention mechanism to derive dynamic weight for each view, which performs much better than previous fusion methods in the field of multi-view subspace clustering. The proposed algorithm is intuitive and can be easily optimized just by using Stochastic Gradient Descent (SGD) because of the neural network framework, which also provides strong non-linear characterization capability compared with traditional subspace clustering approaches. The experimental results on seven real-world data sets have demonstrated the effectiveness of our proposed algorithm against some state-of-the-art subspace learning approaches.

自动驾驶|车辆|车道检测等(3篇)

【1】 Attention Based Communication and Control for Multi-UAV Path Planning 标题：基于注意力的多无人机路径规划通信与控制链接：https://arxiv.org/abs/2112.12584

作者：Hamid Shiri,Hyowoon Seo,Jihong Park,Mehdi Bennis 机构： Seoul National University(SNU) 备注：6 pages, 6 figures, submitted for possible publication 摘要：受自然语言处理中的多头注意（MHA）机制的启发，本文提出了一种用于多无人机路径规划的迭代单头注意（ISHA）机制。ISHA机制由一名通信助手运行，该助手收集无人机的状态嵌入，并向每个无人机分发注意力得分向量。ISHA计算的注意力得分确定了在每个无人机的控制决策中应考虑与其他无人机的交互次数。仿真结果证实，与MHA辅助基线相比，基于ISHA的通信和控制框架实现了更快的飞行速度和更低的无人机碰撞风险，尤其是在有限的通信资源下。摘要：Inspired by the multi-head attention (MHA) mechanism in natural language processing, this letter proposes an iterative single-head attention (ISHA) mechanism for multi-UAV path planning. The ISHA mechanism is run by a communication helper collecting the state embeddings of UAVs and distributing an attention score vector to each UAV. The attention scores computed by ISHA identify how many interactions with other UAVs should be considered in each UAV's control decision-making. Simulation results corroborate that the ISHA-based communication and control framework achieves faster travel with lower inter-UAV collision risks than an MHA-aided baseline, particularly under limited communication resources.

【2】 Statistical Feature-based Personal Information Detection in Mobile Network Traffic 标题：基于统计特征的移动网络流量个人信息检测链接：https://arxiv.org/abs/2112.12346

作者：Shuang Zhao,Shuhui Chen,Ziling Wei 机构：School of Computer, National University of Defense Technology, Chang Sha, China 摘要：随着智能手机的普及，移动应用已经渗透到人们的日常生活中。尽管应用程序提供了丰富的功能，但它们同时也可以访问大量个人信息。因此，隐私问题被提出。为了了解应用程序收集的个人信息，提供了许多解决方案来检测应用程序中的隐私泄漏。近年来，基于流量监测的隐私泄漏检测方法显示出良好的性能和很强的可扩展性。然而，它仍然有一些缺点。首先，它面临着通过模糊检测个人信息泄漏的问题。其次，它不能发现未定义类型的隐私泄漏。针对上述问题，本文提出了一种新的基于流量监测的个人信息检测方法。本文设计了个人信息的统计特征来描述个人信息在流量中的发生模式，包括局部模式和全局模式。然后，基于机器学习算法训练检测器，以发现具有相似模式的潜在个人信息。由于统计特征与个人信息的价值和类型无关，经过训练的检测器能够识别各种类型的隐私泄漏和模糊隐私泄漏。据我们所知，这是第一项基于统计特征检测个人信息的工作。最后，实验结果表明，所提出的方法可以获得比现有方法更好的性能。摘要：With the popularity of smartphones, mobile applications (apps) have penetrated the daily life of people. Although apps provide rich functionalities, they also access a large amount of personal information simultaneously. As a result, privacy concerns are raised. To understand what personal information the apps collect, many solutions are presented to detect privacy leaks in apps. Recently, the traffic monitoring-based privacy leak detection method has shown promising performance and strong scalability. However, it still has some shortcomings. Firstly, it suffers from detecting the leakage of personal information with obfuscation. Secondly, it cannot discover the privacy leaks of undefined type. Aiming at solving the above problems, a new personal information detection method based on traffic monitoring is proposed in this paper. In this paper, statistical features of personal information are designed to depict the occurrence patterns of personal information in the traffic, including local patterns and global patterns. Then a detector is trained based on machine learning algorithms to discover potential personal information with similar patterns. Since the statistical features are independent of the value and type of personal information, the trained detector is capable of identifying various types of privacy leaks and obfuscated privacy leaks. As far as we know, this is the first work that detects personal information based on statistical features. Finally, the experimental results show that the proposed method could achieve better performance than the state-of-the-art.

【3】 Beyond Low Earth Orbit: Biological Research, Artificial Intelligence, and Self-Driving Labs 标题：超越近地轨道：生物学研究、人工智能和自动驾驶实验室链接：https://arxiv.org/abs/2112.12582

作者：Lauren M. Sanders,Jason H. Yang,Ryan T. Scott,Amina Ann Qutub,Hector Garcia Martin,Daniel C. Berrios,Jaden J. A. Hastings,Jon Rask,Graham Mackintosh,Adrienne L. Hoarfrost,Stuart Chalk,John Kalantari,Kia Khezeli,Erik L. Antonsen,Joel Babdor,Richard Barker,Sergio E. Baranzini,Afshin Beheshti,Guillermo M. Delgado-Aparicio,Benjamin S. Glicksberg,Casey S. Greene,Melissa Haendel,Arif A. Hamid,Philip Heller,Daniel Jamieson,Katelyn J. Jarvis,Svetlana V. Komarova,Matthieu Komorowski,Prachi Kothiyal,Ashish Mahabal,Uri Manor,Christopher E. Mason,Mona Matar,George I. Mias,Jack Miller,Jerry G. Myers Jr.,Charlotte Nelson,Jonathan Oribello,Seung-min Park,Patricia Parsons-Wingerter,R. K. Prabhu,Robert J. Reynolds,Amanda Saravia-Butler,Suchi Saria,Aenor Sawyer,Nitin Kumar Singh,Frank Soboczenski,Michael Snyder,Karthik Soman,Corey A. Theriot,David Van Valen,Kasthuri Venkateswaran,Liz Warren,Liz Worthey,Marinka Zitnik,Sylvain V. Costes 机构：Blue Marble Space Institute of Science, Space Biosciences Division, NASA Ames Research Center, Center for Emerging and Re-Emerging Pathogens, Department of Microbiology, Biochemistry and 备注：28 pages, 4 figures 摘要：空间生物学研究的目的是了解太空飞行对生物体的基本影响，发展基础知识以支持深空探索，并最终通过生物工程航天器和栖息地来稳定植物、作物、微生物、动物和人类的生态系统，以维持多行星生命。为了推进这些目标，该领域利用了来自空间和地面模拟研究的实验、平台、数据和模型生物。随着研究扩展到近地轨道之外，实验和平台必须最大程度地自主、轻便、灵活和智能化，以加快知识发现。在这里，我们总结了美国国家航空航天局组织的一次研讨会上提出的关于人工智能、机器学习和建模应用的建议，这些建议为应对这些空间生物学挑战提供了关键解决方案。在未来十年中，将人工智能综合到空间生物学领域将加深对空间飞行效应的生物学理解，促进预测建模和分析，支持最大程度的自主和可复制实验，并有效管理空间数据和元数据，所有这些都是为了让生命在深空中茁壮成长。摘要：Space biology research aims to understand fundamental effects of spaceflight on organisms, develop foundational knowledge to support deep space exploration, and ultimately bioengineer spacecraft and habitats to stabilize the ecosystem of plants, crops, microbes, animals, and humans for sustained multi-planetary life. To advance these aims, the field leverages experiments, platforms, data, and model organisms from both spaceborne and ground-analog studies. As research is extended beyond low Earth orbit, experiments and platforms must be maximally autonomous, light, agile, and intelligent to expedite knowledge discovery. Here we present a summary of recommendations from a workshop organized by the National Aeronautics and Space Administration on artificial intelligence, machine learning, and modeling applications which offer key solutions toward these space biology challenges. In the next decade, the synthesis of artificial intelligence into the field of space biology will deepen the biological understanding of spaceflight effects, facilitate predictive modeling and analytics, support maximally autonomous and reproducible experiments, and efficiently manage spaceborne data and metadata, all with the goal to enable life to thrive in deep space.

推理|分析|理解|解释(4篇)

【1】 Forward Composition Propagation for Explainable Neural Reasoning 标题：前向合成传播在可解释神经推理中的应用链接：https://arxiv.org/abs/2112.12717

作者：Isel Grau,Gonzalo Nápoles,Marilyn Bello,Yamisleydi Salgueiro 机构：Information Systems Group, Eindhoven University of Technology, The Netherlands., Department of Cognitive Science & Artificial Intelligence, Tilburg University, The, Department of Computer Science, Central University of Las Villas, Cuba. 摘要：本文提出了一种称为前向复合传播（FCP）的算法来解释前馈神经网络对结构化模式识别问题的预测。在所提出的FCP算法中，每个神经元由一个组合向量描述，该组合向量表示每个问题特征在该神经元中的作用。合成向量使用给定的输入实例初始化，然后通过整个网络传播，直到到达输出层。值得一提的是，一旦网络的训练网络完成，算法就会执行。每个成分值的符号表示相应的特征是否刺激或抑制神经元，而绝对值则量化了这种影响。为了验证FCP算法的正确性，我们开发了一个关于已知基本事实的最新问题中的偏差检测的案例研究。仿真结果表明，组合值与受保护特征的预期行为密切相关。摘要：This paper proposes an algorithm called Forward Composition Propagation (FCP) to explain the predictions of feed-forward neural networks operating on structured pattern recognition problems. In the proposed FCP algorithm, each neuron is described by a composition vector indicating the role of each problem feature in that neuron. Composition vectors are initialized using a given input instance and subsequently propagated through the whole network until we reach the output layer. It is worth mentioning that the algorithm is executed once the network's training network is done. The sign of each composition value indicates whether the corresponding feature excites or inhibits the neuron, while the absolute value quantifies such an impact. Aiming to validate the FCP algorithm's correctness, we develop a case study concerning bias detection in a state-of-the-art problem in which the ground truth is known. The simulation results show that the composition values closely align with the expected behavior of protected features.

【2】 Explainable Artificial Intelligence Methods in Combating Pandemics: A Systematic Review 标题：可解释人工智能方法在抗击流行病中的系统评价链接：https://arxiv.org/abs/2112.12705

作者：Felipe Giuste,Wenqi Shi,Yuanda Zhu,Tarun Naren,Monica Isgut,Ying Sha,Li Tong,Mitali Gupte,May D. Wang 机构： Georgia Institute of Technology 备注：This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible. arXiv admin note: text overlap with arXiv:2006.11371 by other authors 摘要：尽管有2019冠状病毒疾病的新的人工智能（AI）为基础的解决方案，但很少有人取得了显著的临床影响。人工智能在2019冠状病毒疾病流行中的影响很大程度上是由于缺乏模型透明度。这篇系统性综述考察了可解释人工智能（XAI）在流感大流行期间的使用情况，以及它的使用如何克服现实世界成功的障碍。我们发现，成功使用XAI可以提高模型性能，增强最终用户的信任，并提供影响用户决策所需的价值。我们将向读者介绍常见的XAI技术、它们的实用程序以及它们的具体应用示例。XAI结果的评估也被讨论为使基于AI的临床决策支持系统的价值最大化的重要步骤。我们阐述了XAI的经典、现代和潜在的未来趋势，以阐明新型XAI技术的发展。最后，我们提供了一份在实验设计过程中得到最新出版物支持的建议清单。人工智能解决方案实施过程中的常见挑战也通过潜在解决方案的具体示例加以解决。我们希望这篇综述可以作为指导，以改善未来基于人工智能的解决方案的临床影响。摘要：Despite the myriad peer-reviewed papers demonstrating novel Artificial Intelligence (AI)-based solutions to COVID-19 challenges during the pandemic, few have made significant clinical impact. The impact of artificial intelligence during the COVID-19 pandemic was greatly limited by lack of model transparency. This systematic review examines the use of Explainable Artificial Intelligence (XAI) during the pandemic and how its use could overcome barriers to real-world success. We find that successful use of XAI can improve model performance, instill trust in the end-user, and provide the value needed to affect user decision-making. We introduce the reader to common XAI techniques, their utility, and specific examples of their application. Evaluation of XAI results is also discussed as an important step to maximize the value of AI-based clinical decision support systems. We illustrate the classical, modern, and potential future trends of XAI to elucidate the evolution of novel XAI techniques. Finally, we provide a checklist of suggestions during the experimental design process supported by recent publications. Common challenges during the implementation of AI solutions are also addressed with specific examples of potential solutions. We hope this review may serve as a guide to improve the clinical impact of future AI-based solutions.

【3】 AcME -- Accelerated Model-agnostic Explanations: Fast Whitening of the Machine-Learning Black Box 标题：Acme--加速的模型不可知性解释：机器学习黑盒的快速白化链接：https://arxiv.org/abs/2112.12635

作者：David Dandolo,Chiara Masiero,Mattia Carletti,Davide Dalle Pezze,Gian Antonio Susto 机构：combUniversitá degli Studi di Padova 摘要：在人在回路机器学习应用程序（如决策支持系统）的上下文中，可解释性方法应提供可操作的见解，而不必让用户等待。在本文中，我们提出了加速模型不可知解释（AcME），这是一种可解释性方法，可快速提供全局和局部级别的特征重要性分数。AcME可以对每个回归或分类模型进行后验。AcME不仅计算特征排名，而且还提供了一个假设分析工具来评估特征值的变化如何影响模型预测。我们在合成和真实数据集上评估了所提出的方法，并与SHapley加法解释（SHAP）进行了比较，SHapley加法解释（SHAP）是我们的灵感来源，目前是最先进的模型不可知解释性方法之一。我们在生成的解释的质量方面取得了可比的结果，同时大大减少了计算时间，并为全局和局部解释提供了一致的可视化。为了促进这一领域的研究，并且为了再现性，我们还提供了一个存储库，其中包含用于实验的代码。摘要：In the context of human-in-the-loop Machine Learning applications, like Decision Support Systems, interpretability approaches should provide actionable insights without making the users wait. In this paper, we propose Accelerated Model-agnostic Explanations (AcME), an interpretability approach that quickly provides feature importance scores both at the global and the local level. AcME can be applied a posteriori to each regression or classification model. Not only does AcME compute feature ranking, but it also provides a what-if analysis tool to assess how changes in features values would affect model predictions. We evaluated the proposed approach on synthetic and real-world datasets, also in comparison with SHapley Additive exPlanations (SHAP), the approach we drew inspiration from, which is currently one of the state-of-the-art model-agnostic interpretability approaches. We achieved comparable results in terms of quality of produced explanations while reducing dramatically the computational time and providing consistent visualization for global and local interpretations. To foster research in this field, and for the sake of reproducibility, we also provide a repository with the code used for the experiments.

【4】 Regularized Multivariate Analysis Framework for Interpretable High-Dimensional Variable Selection 标题：可解释高维变量选择的正则化多元分析框架链接：https://arxiv.org/abs/2112.12249

作者：Sergio Muñoz-Romero,Vanessa Gómez-Verdejo,Jerónimo Arenas-García 机构： Department of Signal Processing and Communications, Universidad Rey Juan Carlos, Universidad Carlos III de Madrid 备注：None 摘要：多元分析（MVA）包括一系列众所周知的特征提取方法，这些方法利用代表数据的输入变量之间的相关性。大多数这类方法所具有的一个重要特性是提取的特征之间的不相关性。最近，文献中出现了MVA方法的正则化版本，主要目的是获得解的可解释性。在这些情况下，不再能够以封闭的方式获得解，并且经常使用依赖于两步迭代的更复杂的优化方法。本文递归到另一种方法来有效地解决这个迭代问题。这种方法的主要新颖之处在于保留了原始方法的一些特性，最显著的是提取特征的不相关性。在此框架下，我们提出了一种利用l-21范数在特征提取过程中进行变量选择的新方法。不同问题的实验结果证实了与最新配方相比，该配方的优点。摘要：Multivariate Analysis (MVA) comprises a family of well-known methods for feature extraction which exploit correlations among input variables representing the data. One important property that is enjoyed by most such methods is uncorrelation among the extracted features. Recently, regularized versions of MVA methods have appeared in the literature, mainly with the goal to gain interpretability of the solution. In these cases, the solutions can no longer be obtained in a closed manner, and more complex optimization methods that rely on the iteration of two steps are frequently used. This paper recurs to an alternative approach to solve efficiently this iterative problem. The main novelty of this approach lies in preserving several properties of the original methods, most notably the uncorrelation of the extracted features. Under this framework, we propose a novel method that takes advantage of the l-21 norm to perform variable selection during the feature extraction process. Experimental results over different problems corroborate the advantages of the proposed formulation in comparison to state of the art formulations.

分类|识别(7篇)

【1】 Assessing the Impact of Attention and Self-Attention Mechanisms on the Classification of Skin Lesions 标题：评估注意力和自我注意机制对皮损分类的影响链接：https://arxiv.org/abs/2112.12748

作者：Rafael Pedro,Arlindo L. Oliveira 机构：Lisbon, Portugal, INESC-ID Instituto Superior T´ecnico 摘要：注意机制已经引起了研究界的极大兴趣，因为它们有望显著改善神经网络结构的性能。然而，在任何特定的问题上，我们仍然缺乏一种原则性的方法来选择特定的机制和超参数，从而保证改进。最近，自关注被提出并广泛应用于Transformer式结构中，在一些应用中取得了重大突破。在这项工作中，我们关注两种形式的注意机制：注意模块和自我注意。注意模块用于重新加权各层输入张量的特征。不同的模块有不同的方式在完全连接或卷积层中执行此重新称重。研究的注意力模型是完全模块化的，在这项工作中，它们将与流行的ResNet架构一起使用。自我注意，最初是在自然语言处理领域提出的，它使得把输入序列中的所有项目联系起来成为可能。自我关注在计算机视觉中变得越来越流行，在计算机视觉中，自我关注有时与卷积层结合在一起，尽管最近的一些体系结构完全消除了卷积。在这项工作中，我们研究并客观比较了在一项特定的计算机视觉任务中的许多不同注意机制，即广泛使用的皮肤癌MNIST数据集中的样本分类。结果表明，注意模块有时确实改善了卷积神经网络结构的性能，但这种改进虽然明显且具有统计学意义，但在不同的设置下并不一致。另一方面，通过自我注意机制获得的结果显示出一致且显著的改进，即使在参数数量减少的体系结构中也能获得最佳结果。摘要：Attention mechanisms have raised significant interest in the research community, since they promise significant improvements in the performance of neural network architectures. However, in any specific problem, we still lack a principled way to choose specific mechanisms and hyper-parameters that lead to guaranteed improvements. More recently, self-attention has been proposed and widely used in transformer-like architectures, leading to significant breakthroughs in some applications. In this work we focus on two forms of attention mechanisms: attention modules and self-attention. Attention modules are used to reweight the features of each layer input tensor. Different modules have different ways to perform this reweighting in fully connected or convolutional layers. The attention models studied are completely modular and in this work they will be used with the popular ResNet architecture. Self-Attention, originally proposed in the area of Natural Language Processing makes it possible to relate all the items in an input sequence. Self-Attention is becoming increasingly popular in Computer Vision, where it is sometimes combined with convolutional layers, although some recent architectures do away entirely with convolutions. In this work, we study and perform an objective comparison of a number of different attention mechanisms in a specific computer vision task, the classification of samples in the widely used Skin Cancer MNIST dataset. The results show that attention modules do sometimes improve the performance of convolutional neural network architectures, but also that this improvement, although noticeable and statistically significant, is not consistent in different settings. The results obtained with self-attention mechanisms, on the other hand, show consistent and significant improvements, leading to the best results even in architectures with a reduced number of parameters.

【2】 Prolog-based agnostic explanation module for structured pattern classification 标题：基于PROLOG的结构化模式分类不可知性解释模块链接：https://arxiv.org/abs/2112.12641

作者：Gonzalo Nápoles,Fabian Hoitsma,Andreas Knoben,Agnieszka Jastrzebska,Maikel Leon Espinosa 机构：Department of Cognitive Science & Artificial Intelligence, Tilburg University, The, Netherlands., Warsaw University of Technology, Poland., Department of Business Technology, Miami Herbert Business School, University of, Miami, USA 摘要：本文提出了一个基于Prolog的推理模块，根据黑盒分类器计算的预测生成反事实解释。所提出的符号推理模块还可以使用地面真值标签而不是预测的标签来解决假设查询。总的来说，我们的方法包括四个定义良好的阶段，可以应用于任何结构化模式分类问题。首先，我们通过输入缺失值和归一化数字特征对给定数据集进行预处理。其次，我们使用模糊聚类将数字特征转换为符号特征，从而将提取的模糊聚类映射到一组有序的预定义符号。第三，我们使用标称值、预定义符号、决策类和置信值将实例编码为Prolog规则。第四，利用模糊粗糙集理论计算每个Prolog规则的总体置信度，以处理将数值转换为符号所引起的不确定性。这一步是对新的相似性函数的额外理论贡献，用于比较先前定义的涉及置信值的Prolog规则。最后，我们实现了一个聊天机器人作为人类和基于Prolog的推理模块之间的代理，以解决自然语言查询并生成反事实解释。在使用合成数据集的数值模拟过程中，我们研究了使用不同的模糊算子和相似函数时系统的性能。最后，我们将说明我们的推理模块如何使用不同的用例工作。摘要：This paper presents a Prolog-based reasoning module to generate counterfactual explanations given the predictions computed by a black-box classifier. The proposed symbolic reasoning module can also resolve what-if queries using the ground-truth labels instead of the predicted ones. Overall, our approach comprises four well-defined stages that can be applied to any structured pattern classification problem. Firstly, we pre-process the given dataset by imputing missing values and normalizing the numerical features. Secondly, we transform numerical features into symbolic ones using fuzzy clustering such that extracted fuzzy clusters are mapped to an ordered set of predefined symbols. Thirdly, we encode instances as a Prolog rule using the nominal values, the predefined symbols, the decision classes, and the confidence values. Fourthly, we compute the overall confidence of each Prolog rule using fuzzy-rough set theory to handle the uncertainty caused by transforming numerical quantities into symbols. This step comes with an additional theoretical contribution to a new similarity function to compare the previously defined Prolog rules involving confidence values. Finally, we implement a chatbot as a proxy between human beings and the Prolog-based reasoning module to resolve natural language queries and generate counterfactual explanations. During the numerical simulations using synthetic datasets, we study the performance of our system when using different fuzzy operators and similarity functions. Towards the end, we illustrate how our reasoning module works using different use cases.

【3】 Combining Minkowski and Chebyshev: New distance proposal and survey of distance metrics using k-nearest neighbours classifier 标题：结合Minkowski和Chebyshev：新的距离建议和使用k-近邻分类器的距离度量综述链接：https://arxiv.org/abs/2112.12549

作者：Érick Oliveira Rodrigues 机构： Rodrigues Department of Computer Science, Universidade Federal de Itajubá (UNIFEI) 备注：Pattern Recognition Letters, 2018 摘要：这项工作提出了一个结合了Minkowski距离和Chebyshev距离的距离，可以看作是一个中间距离。这种组合不仅在Z^2中的邻域迭代任务中实现了高效的运行时间，而且在与k-最近邻（k-NN）分类器结合时也获得了良好的精度。在离散邻域迭代中，建议的距离大约比曼哈顿距离快1.3倍，比欧几里德距离快329.5倍。使用UCI存储库中总共33个数据集、15个距离和分配给k的值（从1到200不等）对k-NN分类器进行精度分析。在本实验中，建议的距离比其对应距离获得的精度更高（33例中有26例），也更频繁地获得最佳精度（33例中有9例）。摘要：This work proposes a distance that combines Minkowski and Chebyshev distances and can be seen as an intermediary distance. This combination not only achieves efficient run times in neighbourhood iteration tasks in Z^2, but also obtains good accuracies when coupled with the k-Nearest Neighbours (k-NN) classifier. The proposed distance is approximately 1.3 times faster than Manhattan distance and 329.5 times faster than Euclidean distance in discrete neighbourhood iterations. An accuracy analysis of the k-NN classifier using a total of 33 datasets from the UCI repository, 15 distances and values assigned to k that vary from 1 to 200 is presented. In this experiment, the proposed distance obtained accuracies that were better than the average more often than its counterparts (in 26 cases out of 33), and also obtained the best accuracy more frequently (in 9 out of 33 cases).

【4】 Your Face Mirrors Your Deepest Beliefs-Predicting Personality and Morals through Facial Emotion Recognition 标题：你的脸反映了你最深的信仰-通过面部情感识别预测人格和道德链接：https://arxiv.org/abs/2112.12455

作者：P. A. Gloor,A. Fronzetti Colladon,E. Altuntas,C. Cetinkaya,M. F. Kaiser,L. Ripperger,T. Schaefer 机构：com Department of Data Science, Lucerne University of Applied Sciences and Arts 备注：None 摘要：我们真的能“读懂眼睛里的思想”吗？此外，人工智能能帮助我们完成这项任务吗？本文通过介绍一个机器学习系统来回答这两个问题，该系统根据人脸预测个体的个性特征。它通过面部情绪识别（FER）跟踪个体面部的情绪反应，同时观看15个不同类型的短片。为了校准系统，我们邀请了85人观看视频，同时通过面部表情分析他们的情绪反应。同时，这些人还参加了四项经过充分验证的人格特征和道德价值调查：修订的新FFI人格问卷、海特道德基础测试、施瓦茨个人价值体系和领域特定风险承担量表（DOSPERT）。我们发现，一个人的个性特征和道德价值观可以通过他们对视频的情绪反应来预测，如他们脸上所示，使用梯度增强树的预测准确率高达86%。我们还发现，不同的视频可以更好地预测不同的个性特征，换句话说，没有一个视频可以准确预测所有的个性特征，但只有对不同视频的混合反应才能准确预测。摘要：Can we really "read the mind in the eyes"? Moreover, can AI assist us in this task? This paper answers these two questions by introducing a machine learning system that predicts personality characteristics of individuals on the basis of their face. It does so by tracking the emotional response of the individual's face through facial emotion recognition (FER) while watching a series of 15 short videos of different genres. To calibrate the system, we invited 85 people to watch the videos, while their emotional responses were analyzed through their facial expression. At the same time, these individuals also took four well-validated surveys of personality characteristics and moral values: the revised NEO FFI personality inventory, the Haidt moral foundations test, the Schwartz personal value system, and the domain-specific risk-taking scale (DOSPERT). We found that personality characteristics and moral values of an individual can be predicted through their emotional response to the videos as shown in their face, with an accuracy of up to 86% using gradient-boosted trees. We also found that different personality characteristics are better predicted by different videos, in other words, there is no single video that will provide accurate predictions for all personality characteristics, but it is the response to the mix of different videos that allows for accurate prediction.

【5】 Morphological classifiers 标题：形态分类器链接：https://arxiv.org/abs/2112.12262

作者：É. O. Rodrigues,A. Conci,P. Liatsis 机构：a Institute of Science and Technology, Universidade Federal de Itajuba (UNIFEI), Minas Gerais, Brazil, b Department of Computer Science, Universidade Federal Fluminense, Niterói - Rio de Janeiro, Brazil 备注：Pattern Recognition, 2018 摘要：本文提出了一种新的分类器&形态分类器（MC）。MCs从数学形态学和监督学习中收集概念。这种聚合的结果是，根据停止标准和结构元素的选择，可以保留类的形状特征的分类器。MCs基本上基于集合论，其分类模型本身可以是一个数学集合。本文提出了两种形态分类器，即形态k-NN（MkNN）和形态膨胀分类器（MDC），证明了该方法的可行性。这项工作为MCs的优势提供了证据，例如，非常快的分类时间以及具有竞争力的准确率。使用p维数据集测试MkNN和MDC的性能。在8个数据集中的5个数据集中，MCs与14个成熟的分类器持平或表现优异。在所有情况下，获得的准确度都高于使用所有分类器获得的平均准确度。此外，建议的实现利用图形处理单元（GPU）的能力来加速处理。摘要：This work proposes a new type of classifier called Morphological Classifier (MC). MCs aggregate concepts from mathematical morphology and supervised learning. The outcomes of this aggregation are classifiers that may preserve shape characteristics of classes, subject to the choice of a stopping criterion and structuring element. MCs are fundamentally based on set theory, and their classification model can be a mathematical set itself. Two types of morphological classifiers are proposed in the current work, namely, Morphological k-NN (MkNN) and Morphological Dilation Classifier (MDC), which demonstrate the feasibility of the approach. This work provides evidence regarding the advantages of MCs, e.g., very fast classification times as well as competitive accuracy rates. The performance of MkNN and MDC was tested using p -dimensional datasets. MCs tied or outperformed 14 well established classifiers in 5 out of 8 datasets. In all occasions, the obtained accuracies were higher than the average accuracy obtained with all classifiers. Moreover, the proposed implementations utilize the power of the Graphics Processing Units (GPUs) to speed up processing.

【6】 MC-DGCNN: A Novel DNN Architecture for Multi-Category Point Set Classification 标题：MC-DGCNN：一种新的多类别点集分类DNN结构链接：https://arxiv.org/abs/2112.12219

作者：Majid Farhadloo,Carl Molnar,Gaoxiang Luo,Yan Li,Shashi Shekhar,Rachel L. Maus,Svetomir N. Markovic,Raymond Moore,Alexey Leontovich 摘要：点集分类旨在建立一个表示学习模型，区分点集数据的空间和分类配置。这个问题在社会上很重要，因为在许多应用领域，如免疫学和微生物生态学。这一问题具有挑战性，因为不同类别的点之间的相互作用并不总是相等的；因此，表征学习模型必须有选择地学习最相关的多范畴关系。相关工作受限于（1）学习不同多范畴关系的重要性，特别是对于高阶相互作用，（2）除了简单测量相对距离或将前馈神经网络应用于坐标之外，没有充分利用点的空间分布。为了克服这些限制，我们利用动态图卷积神经网络（DGCNN）结构设计了一种新的多类别DGCNN（MC-DGCNN），为多类别点集分类提供了位置表示和点对注意层。MC-DGCNN能够识别每个点对的分类重要性，并将其扩展到N向空间关系，同时仍然保留DGCNN的所有属性和优点（例如可微性）。实验结果表明，该体系结构具有较高的计算效率，在实际数据集上明显优于现有的深度学习体系结构。摘要：Point set classification aims to build a representation learning model that distinguishes between spatial and categorical configurations of point set data. This problem is societally important since in many applications domains such as immunology, and microbial ecology. This problem is challenging since the interactions between different categories of points are not always equal; as a result, the representation learning model must selectively learn the most relevant multi-categorical relationships. The related works are limited (1) in learning the importance of different multi-categorical relationships, especially for high-order interactions, and (2) do not fully exploit the spatial distribution of points beyond simply measuring relative distance or applying a feed-forward neural network to coordinates. To overcome these limitations, we leverage the dynamic graph convolutional neural network (DGCNN) architecture to design a novel multi-category DGCNN (MC-DGCNN), contributing location representation and point pair attention layers for multi-categorical point set classification. MC-DGCNN has the ability to identify the categorical importance of each point pair and extends this to N-way spatial relationships, while still preserving all the properties and benefits of DGCNN (e.g., differentiability). Experimental results show that the proposed architecture is computationally efficient and significantly outperforms current deep learning architectures on real-world datasets.

【7】 Optimal learning of high-dimensional classification problems using deep neural networks 标题：基于深度神经网络的高维分类问题的最优学习链接：https://arxiv.org/abs/2112.12555

作者：Philipp Petersen,Felix Voigtlaender 摘要：在假设决策边界具有一定规律性的前提下，研究了从无噪声训练样本中学习分类函数的问题。对于一般类型的连续决策边界，我们建立了这个估计问题的通用下界。对于局部Barron正则决策边界类，我们发现最优估计率本质上独立于基本维数，并且可以通过经验风险最小化方法在一类合适的深度神经网络上实现。这些结果基于Barron正则函数类的$L^1$和$L^\infty$熵的新估计。摘要：We study the problem of learning classification functions from noiseless training samples, under the assumption that the decision boundary is of a certain regularity. We establish universal lower bounds for this estimation problem, for general classes of continuous decision boundaries. For the class of locally Barron-regular decision boundaries, we find that the optimal estimation rates are essentially independent of the underlying dimension and can be realized by empirical risk minimization methods over a suitable class of deep neural networks. These results are based on novel estimates of the $L^1$ and $L^\infty$ entropies of the class of Barron-regular functions.

表征(1篇)

【1】 Revisiting Transformation Invariant Geometric Deep Learning: Are Initial Representations All You Need? 标题：重温变换不变几何深度学习：初始表示是您所需要的全部吗？链接：https://arxiv.org/abs/2112.12345

作者：Ziwei Zhang,Xin Wang,Zeyang Zhang,Peng Cui,Wenwu Zhu 机构：Tsinghua University, Beijing, China 备注：11 pages 摘要：几何深度学习，即设计神经网络来处理无处不在的几何数据，如点云和图形，在过去十年中取得了巨大的成功。一个关键的归纳偏差是模型可以对各种变换（如平移、旋转和缩放）保持不变性。现有的图神经网络（GNN）方法只能保持置换不变性，不能保证对其他变换的不变性。除了GNNs之外，其他的工作设计了复杂的变换不变层，这在计算上是昂贵的，并且很难扩展。为了解决这个问题，我们重新探讨了为什么现有的神经网络在处理几何数据时不能保持变换不变性。我们的发现表明，变换不变性和保持距离的初始表示足以实现变换不变性，而不需要复杂的神经层设计。基于这些发现，我们提出了变换不变神经网络（TinvNN），这是一种直观而通用的几何数据框架。具体地说，我们通过修改多维尺度实现了变换不变性和保持距离的初始点表示，然后将这些表示输入神经网络。我们证明了TinvNN能够严格保证变换不变性，具有足够的通用性和灵活性，可以与现有的神经网络相结合。在点云分析和组合优化方面的大量实验结果证明了该方法的有效性和普遍适用性。基于实验结果，我们主张TinvNN应被视为进一步研究变换不变几何深度学习的一个新的起点和基本基线。摘要：Geometric deep learning, i.e., designing neural networks to handle the ubiquitous geometric data such as point clouds and graphs, have achieved great successes in the last decade. One critical inductive bias is that the model can maintain invariance towards various transformations such as translation, rotation, and scaling. The existing graph neural network (GNN) approaches can only maintain permutation-invariance, failing to guarantee invariance with respect to other transformations. Besides GNNs, other works design sophisticated transformation-invariant layers, which are computationally expensive and difficult to be extended. To solve this problem, we revisit why the existing neural networks cannot maintain transformation invariance when handling geometric data. Our findings show that transformation-invariant and distance-preserving initial representations are sufficient to achieve transformation invariance rather than needing sophisticated neural layer designs. Motivated by these findings, we propose Transformation Invariant Neural Networks (TinvNN), a straightforward and general framework for geometric data. Specifically, we realize transformation-invariant and distance-preserving initial point representations by modifying multi-dimensional scaling before feeding the representations into neural networks. We prove that TinvNN can strictly guarantee transformation invariance, being general and flexible enough to be combined with the existing neural networks. Extensive experimental results on point cloud analysis and combinatorial optimization demonstrate the effectiveness and general applicability of our proposed method. Based on the experimental results, we advocate that TinvNN should be considered a new starting point and an essential baseline for further studies of transformation-invariant geometric deep learning.

优化|敛散性(6篇)

【1】 Integrating Material Selection with Design Optimization via Neural Networks 标题：基于神经网络的选材与设计优化集成链接：https://arxiv.org/abs/2112.12566

作者：Aaditya Chandrasekhar,Saketh Sridhara,Krishnan Suresh 机构：Department of Mechanical Engineering, University of Wisconsin-Madison 备注：16 pages, submitted to Structural and Multidisciplinary Optimization 摘要：工程设计过程通常需要优化基础几何结构，同时选择合适的材料。对于某类简单问题，这两个问题是可分离的，例如，可以先选择最佳材料，然后优化几何体。然而，总的来说，这两者是不可分离的。此外，材料选择的离散性与基于梯度的几何优化不兼容，使得同时优化具有挑战性。在本文中，我们建议使用变分自动编码器（VAE）进行同步优化。首先，使用数据驱动VAE将离散材料数据库投影到连续可微的潜在空间。然后将其与嵌入有限元解算器的完全连接的神经网络耦合，以同时优化材料和几何体。优化过程中利用了神经网络内置的梯度优化器和反向传播。建议的框架使用桁架进行演示，其中需要从数据库中选择最佳材料，同时优化桁架构件的横截面积。几个数值例子说明了该框架的有效性。这些实验中使用的Python代码可以在github上获得。com/UW-ERSL/MaTruss 摘要：The engineering design process often entails optimizing the underlying geometry while simultaneously selecting a suitable material. For a certain class of simple problems, the two are separable where, for example, one can first select an optimal material, and then optimize the geometry. However, in general, the two are not separable. Furthermore, the discrete nature of material selection is not compatible with gradient-based geometry optimization, making simultaneous optimization challenging. In this paper, we propose the use of variational autoencoders (VAE) for simultaneous optimization. First, a data-driven VAE is used to project the discrete material database onto a continuous and differentiable latent space. This is then coupled with a fully-connected neural network, embedded with a finite-element solver, to simultaneously optimize the material and geometry. The neural-network's built-in gradient optimizer and back-propagation are exploited during optimization. The proposed framework is demonstrated using trusses, where an optimal material needs to be chosen from a database, while simultaneously optimizing the cross-sectional areas of the truss members. Several numerical examples illustrate the efficacy of the proposed framework. The Python code used in these experiments is available at github.com/UW-ERSL/MaTruss

【2】 Model Selection in Batch Policy Optimization 标题：批量策略优化中的模型选择链接：https://arxiv.org/abs/2112.12320

作者：Jonathan N. Lee,George Tucker,Ofir Nachum,Bo Dai 机构：♯Stanford University, †Google Research, Brain Team 摘要：我们研究批量策略优化中的模型选择问题：给定一个固定的部分反馈数据集和$M$模型类，学习一个性能与从最佳模型类导出的策略相竞争的策略。我们用线性模型类形式化了背景bandit设置中的问题，方法是确定任何模型选择算法都应最佳权衡的三个误差源：（1）近似误差，（2）统计复杂性和（3）覆盖率。前两个来源在监督学习的模型选择中很常见，在监督学习中对这些属性的最佳权衡进行了很好的研究。与此相反，第三个源对于批处理策略优化是唯一的，并且是由于设置固有的数据集移动。我们首先证明，没有一种批量策略优化算法能够保证同时解决这三个问题，这与批量策略优化的困难和监督学习的积极结果形成了鲜明的对比。尽管这是一个负面结果，但我们表明，放松这三个错误源中的任何一个都可以使算法的设计实现其余两个的近似oracle不等式。最后，我们通过实验证明了这些算法的有效性。摘要：We study the problem of model selection in batch policy optimization: given a fixed, partial-feedback dataset and $M$ model classes, learn a policy with performance that is competitive with the policy derived from the best model class. We formalize the problem in the contextual bandit setting with linear model classes by identifying three sources of error that any model selection algorithm should optimally trade-off in order to be competitive: (1) approximation error, (2) statistical complexity, and (3) coverage. The first two sources are common in model selection for supervised learning, where optimally trading-off these properties is well-studied. In contrast, the third source is unique to batch policy optimization and is due to dataset shift inherent to the setting. We first show that no batch policy optimization algorithm can achieve a guarantee addressing all three simultaneously, revealing a stark contrast between difficulties in batch policy optimization and the positive results available in supervised learning. Despite this negative result, we show that relaxing any one of the three error sources enables the design of algorithms achieving near-oracle inequalities for the remaining two. We conclude with experiments demonstrating the efficacy of these algorithms.

【3】 Simple and near-optimal algorithms for hidden stratification and multi-group learning 标题：一种简单的近似最优的隐层多群体学习算法链接：https://arxiv.org/abs/2112.12181

作者：Christopher Tosh,Daniel Hsu 机构：Memorial Sloan Kettering Cancer Center, New York, NY, Columbia University, New York, NY 摘要：多群体不可知学习是一种形式化的学习标准，它关注人群亚群体中预测因子的条件风险。该标准解决了最近的实际问题，如分组公平性和隐藏分层。本文研究了多群体学习问题解的结构，并给出了简单的近似最优算法。摘要：Multi-group agnostic learning is a formal learning criterion that is concerned with the conditional risks of predictors within subgroups of a population. The criterion addresses recent practical concerns such as subgroup fairness and hidden stratification. This paper studies the structure of solutions to the multi-group learning problem, and provides simple and near-optimal algorithms for the learning problem.

【4】 Optimal and instance-dependent guarantees for Markovian linear stochastic approximation 标题：马尔可夫线性随机逼近的最优和实例相关保证链接：https://arxiv.org/abs/2112.12770

作者：Wenlong Mou,Ashwin Pananjady,Martin J. Wainwright,Peter L. Bartlett 机构：Department of Electrical Engineering and Computer Sciences⋄, Department of Statistics†, UC Berkeley, Schools of Industrial & Systems Engineering, and, Electrical & Computer Engineering⋆, Georgia Tech 摘要：我们研究了基于遍历马尔可夫链观测长度为$n$的轨迹近似求解$d$维线性不动点方程的随机逼近方法。我们首先在标准格式的最后一次迭代的平方误差上展示了$t{\mathrm{mix}}\tfrac{d}{n}$阶的非渐近界，其中$t{\mathrm{mix}$是混合时间。然后，我们证明了在适当平均的迭代序列上的一个非渐近依赖实例的界，其前导项与局部渐近极大极小极限相匹配，包括在高阶项中对参数$（d，t{\mathrm{mix}}}）的急剧依赖。我们用一个非渐近的minimax下界来补充这些上界，该下界建立了平均SA估计的实例最优性。我们推导了马尔可夫噪声政策评估的这些结果的推论——包括[0,1]中所有$\lambda\的TD（$\lambda$）算法家族——和线性自回归模型。我们的实例相关特征为超参数调整的细粒度模型选择程序的设计打开了大门（例如，在运行TD（$\lambda$）算法时选择$\lambda$的值）。摘要：We study stochastic approximation procedures for approximately solving a $d$-dimensional linear fixed point equation based on observing a trajectory of length $n$ from an ergodic Markov chain. We first exhibit a non-asymptotic bound of the order $t_{\mathrm{mix}} \tfrac{d}{n}$ on the squared error of the last iterate of a standard scheme, where $t_{\mathrm{mix}}$ is a mixing time. We then prove a non-asymptotic instance-dependent bound on a suitably averaged sequence of iterates, with a leading term that matches the local asymptotic minimax limit, including sharp dependence on the parameters $(d, t_{\mathrm{mix}})$ in the higher order terms. We complement these upper bounds with a non-asymptotic minimax lower bound that establishes the instance-optimality of the averaged SA estimator. We derive corollaries of these results for policy evaluation with Markov noise -- covering the TD($\lambda$) family of algorithms for all $\lambda \in [0, 1)$ -- and linear autoregressive models. Our instance-dependent characterizations open the door to the design of fine-grained model selection procedures for hyperparameter tuning (e.g., choosing the value of $\lambda$ when running the TD($\lambda$) algorithm).

【5】 Integrating Quantum Processor Device and Control Optimization in a Gradient-based Framework 标题：基于梯度的量子处理器器件与控制优化集成链接：https://arxiv.org/abs/2112.12509

作者：Xiaotong Ni,Hui-Hai Zhao,Lei Wang,Feng Wu,Jianxin Chen 机构：Alibaba Quantum Laboratory, Alibaba Group, Hangzhou, Zhejiang , P.R.China, Alibaba Quantum Laboratory, Alibaba Group, Beijing , P.R.China, Institute of Physics, Chinese Academy of Science, Beijing, P.R.China 摘要：在量子处理器中，设备设计和外部控制共同影响目标量子操作的质量。随着我们不断寻求更好的替代量子比特平台，我们探索了越来越大的设备和控制设计空间。因此，优化变得越来越具有挑战性。在这项工作中，我们证明了反映设计目标的优值可以根据设备和控制参数进行微分。此外，我们可以以类似于反向传播算法的方式高效地计算设计目标的梯度，然后利用梯度联合高效地优化设备和控制参数。这将量子最优控制的范围扩展到超导器件的设计。我们还通过几个例子证明了基于梯度的设备和控制参数联合优化的可行性。摘要：In a quantum processor, the device design and external controls together contribute to the quality of the target quantum operations. As we continuously seek better alternative qubit platforms, we explore the increasingly large device and control design space. Thus, optimization becomes more and more challenging. In this work, we demonstrate that the figure of merit reflecting a design goal can be made differentiable with respect to the device and control parameters. In addition, we can compute the gradient of the design objective efficiently in a similar manner to the back-propagation algorithm and then utilize the gradient to optimize the device and the control parameters jointly and efficiently. This extends the scope of the quantum optimal control to superconducting device design. We also demonstrate the viability of gradient-based joint optimization over the device and control parameters through a few examples.

【6】 Decentralized Multi-Task Stochastic Optimization With Compressed Communications 标题：带压缩通信的分散多任务随机优化链接：https://arxiv.org/abs/2112.12373

作者：Navjot Singh,Xuanyu Cao,Suhas Diggavi,Tamer Basar 机构：University of California, Los Angeles, USA, †Hong Kong University of Science and Technology, Clear Water Bay, Hong Kong, ‡University of Illinois Urbana-Champaign, Urbana, USA 备注：31 pages, 4 figures 摘要：我们考虑了一个多智能体网络，其中每个节点具有依赖于该节点和随机变量的决策变量的随机（本地）成本函数，并且相邻节点的决策变量是成对约束的。网络存在一个聚合目标函数，由节点处局部代价函数的期望值相加而成，网络的总体目标是在所有成对约束下获得该聚合目标函数的最小解。这将通过分散信息和局部计算在节点级实现，相邻节点只允许交换压缩信息。本文开发了算法并获得了两种不同的节点局部信息可用性模型的性能界限：（i）样本反馈，其中每个节点都可以直接访问局部随机变量的样本以评估其局部成本；（ii）bandit反馈，其中随机变量的样本不可用，但是，每个节点只能获得靠近决策的两个随机点的局部代价函数的值。对于这两种模型，在邻域间进行压缩通信的情况下，我们开发了分散鞍点算法，其性能（从顺序意义上）与未进行通信压缩的算法相同；具体地说，我们证明了偏离全局最小值和违反约束分别由$\mathcal{O}（T^{-\frac{1}{2}}）和$\mathcal{O}（T^{-\frac{1}{4}）上界，其中，$T$是迭代次数。文中给出的数值例子证实了这些界限，并证明了所提方法的通信效率。摘要：We consider a multi-agent network where each node has a stochastic (local) cost function that depends on the decision variable of that node and a random variable, and further the decision variables of neighboring nodes are pairwise constrained. There is an aggregate objective function for the network, composed additively of the expected values of the local cost functions at the nodes, and the overall goal of the network is to obtain the minimizing solution to this aggregate objective function subject to all the pairwise constraints. This is to be achieved at the node level using decentralized information and local computation, with exchanges of only compressed information allowed by neighboring nodes. The paper develops algorithms and obtains performance bounds for two different models of local information availability at the nodes: (i) sample feedback, where each node has direct access to samples of the local random variable to evaluate its local cost, and (ii) bandit feedback, where samples of the random variables are not available, but only the values of the local cost functions at two random points close to the decision are available to each node. For both models, with compressed communication between neighbors, we have developed decentralized saddle-point algorithms that deliver performances no different (in order sense) from those without communication compression; specifically, we show that deviation from the global minimum value and violations of the constraints are upper-bounded by $\mathcal{O}(T^{-\frac{1}{2}})$ and $\mathcal{O}(T^{-\frac{1}{4}})$, respectively, where $T$ is the number of iterations. Numerical examples provided in the paper corroborate these bounds and demonstrate the communication efficiency of the proposed method.

预测|估计(1篇)

【1】 Neuroevolution deep learning architecture search for estimation of river surface elevation from photogrammetric Digital Surface Models 标题：神经进化深度学习结构在从摄影测量数字表面模型估算河流表面高程中的搜索链接：https://arxiv.org/abs/2112.12510

作者：Radosław Szostak,Marcin Pietroń,Mirosław Zimnoch,Przemysław Wachniew,Paweł Ćwiąkała,Edyta Puniach 机构：AGH UST, Marcin Pietro´n, Paweł ´Cwi ˛akała 备注：extended version of NeurIPS 2021 Workshop paper - ML4PhysicalSciences 摘要：鉴于与全球变暖有关的极端水文事件日益频繁，对水的需求日益增加，开发地表水观测的新方法至关重要。使用无人机摄影测量获得的正射影像和数字表面模型（DSM）可用于确定河流的水面高程（WSE）。然而，由于摄影测量算法的限制，DSMs上的水面受到干扰，这项任务很困难。在这项研究中，机器学习用于从受干扰的摄影测量数据中提取WSE值。水文学和摄影测量专家为此专门准备了一个全新的数据集。新方法是实现高时空分辨率水面测量自动化的重要一步。这些数据可用于验证和校准水文、水力和水动力模型，使水文预报更加准确，特别是预测洪水或干旱等极端和危险事件。据我们所知，这是第一种为此目的创建数据集并为此任务使用深度学习模型的方法。此外，神经进化算法被设置为探索不同的架构以找到局部最优模型，并执行非梯度搜索以微调模型参数。与通过摄影测量DSM确定WSE的手动方法相比，所获得的结果具有更好的精度。摘要：Development of the new methods of surface water observation is crucial in the perspective of increasingly frequent extreme hydrological events related to global warming and increasing demand for water. Orthophotos and digital surface models (DSMs) obtained using UAV photogrammetry can be used to determine the Water Surface Elevation (WSE) of a river. However, this task is difficult due to disturbances of the water surface on DSMs caused by limitations of photogrammetric algorithms. In this study, machine learning was used to extract a WSE value from disturbed photogrammetric data. A brand new dataset has been prepared specifically for this purpose by hydrology and photogrammetry experts. The new method is an important step toward automating water surface level measurements with high spatial and temporal resolution. Such data can be used to validate and calibrate of hydrological, hydraulic and hydrodynamic models making hydrological forecasts more accurate, in particular predicting extreme and dangerous events such as floods or droughts. For our knowledge this is the first approach in which dataset was created for this purpose and deep learning models were used for this task. Additionally, neuroevolution algorithm was set to explore different architectures to find local optimal models and non-gradient search was performed to fine-tune the model parameters. The achieved results have better accuracy compared to manual methods of determining WSE from photogrammetric DSMs.

其他神经网络|深度学习|模型|建模(19篇)

【1】 Learning Cooperative Multi-Agent Policies with Partial Reward Decoupling 标题：基于部分报酬解耦的协作多Agent策略学习链接：https://arxiv.org/abs/2112.12740

作者：Benjamin Freed,Aditya Kapoor,Ian Abraham,Jeff Schneider,Howie Choset 备注：in IEEE Robotics and Automation Letters 摘要：将多智能体强化学习扩展到大量智能体的一个突出障碍是为单个智能体的行为分配信用。在本文中，我们使用一种称为{部分报酬解耦}（PRD）的方法来解决这个信用分配问题，该方法试图将大型合作多代理RL问题分解为涉及代理子集的解耦子问题，从而简化信用分配。我们的经验表明，与其他各种actor-critic方法相比，在actor-critic算法中使用PRD分解RL问题会导致较低的方差策略梯度估计，从而提高数据效率、学习稳定性和跨多agent RL任务的渐近性能。此外，我们将我们的方法与反事实多智能体策略梯度（COMA）相关联，这是一种最先进的MARL算法，并通过经验证明，我们的方法通过更好地利用智能体奖励流中的信息，以及通过使用优势估计的最新进展，优于COMA。摘要：One of the preeminent obstacles to scaling multi-agent reinforcement learning to large numbers of agents is assigning credit to individual agents' actions. In this paper, we address this credit assignment problem with an approach that we call \textit{partial reward decoupling} (PRD), which attempts to decompose large cooperative multi-agent RL problems into decoupled subproblems involving subsets of agents, thereby simplifying credit assignment. We empirically demonstrate that decomposing the RL problem using PRD in an actor-critic algorithm results in lower variance policy gradient estimates, which improves data efficiency, learning stability, and asymptotic performance across a wide array of multi-agent RL tasks, compared to various other actor-critic approaches. Additionally, we relate our approach to counterfactual multi-agent policy gradient (COMA), a state-of-the-art MARL algorithm, and empirically show that our approach outperforms COMA by making better use of information in agents' reward streams, and by enabling recent advances in advantage estimation to be used.

【2】 Modeling Implicit Bias with Fuzzy Cognitive Maps 标题：基于模糊认知图的隐性偏差建模链接：https://arxiv.org/abs/2112.12713

作者：Gonzalo Nápoles,Isel Grau,Leonardo Concepción,Lisa Koutsoviti Koumeri,João Paulo Papa 机构：Department of Cognitive Science & Artificial Intelligence, Tilburg University, The Netherlands., Information Systems Group, Eindhoven University of Technology, The Netherlands., Business Informatics Research Group, Hasselt University, Belgium. 摘要：本文提出了一个模糊认知地图模型来量化结构化数据集中的内隐偏差，其中特征可以是数字的，也可以是离散的。在我们的方案中，问题特征映射到专家在运行假设模拟时最初激活的神经概念，而连接神经概念的权重表示特征之间的绝对相关性/关联模式。此外，我们还介绍了一种新的推理机制，该机制配备了一个类似于标准化的传递函数，可以防止神经元饱和。这种新的推理机制的另一个优点是，在每次迭代中更新神经元的激活值时，它可以通过调节非线性来轻松控制。最后，我们研究了模型的收敛性，得到了不动点吸引子存在唯一性的解析条件。摘要：This paper presents a Fuzzy Cognitive Map model to quantify implicit bias in structured datasets where features can be numeric or discrete. In our proposal, problem features are mapped to neural concepts that are initially activated by experts when running what-if simulations, whereas weights connecting the neural concepts represent absolute correlation/association patterns between features. In addition, we introduce a new reasoning mechanism equipped with a normalization-like transfer function that prevents neurons from saturating. Another advantage of this new reasoning mechanism is that it can easily be controlled by regulating nonlinearity when updating neurons' activation values in each iteration. Finally, we study the convergence of our model and derive analytical conditions concerning the existence and unicity of fixed-point attractors.

【3】 A Survey of Near-Data Processing Architectures for Neural Networks 标题：神经网络近数据处理体系结构综述链接：https://arxiv.org/abs/2112.12630

作者：Mehdi Hassanpour,Marc Riera,Antonio González 摘要：数据密集型工作负载和应用程序，如机器学习（ML），基本上受到基于冯·诺依曼体系结构的传统计算系统的限制。随着数据移动操作和能源消耗成为计算系统设计的关键瓶颈，对非传统方法（如近数据处理（NDP）、机器学习，特别是基于神经网络（NN）的加速器的兴趣显著增加。新兴的内存技术，如ReRAM和3D stacked，由于其既可以作为高密度/低能存储，也可以作为内存内/近内存计算/搜索引擎，因此有望有效地为NN构建基于NDP的加速器。在本文中，我们提出了一个为神经网络设计NDP架构的技术综述。通过根据所采用的存储技术对这些技术进行分类，我们强调了它们的相似性和差异性。最后，我们讨论了开放的挑战和未来的展望，这些挑战和展望需要加以探索，以便改进和扩展未来计算平台对NDP体系结构的采用。本论文对计算机架构师、芯片设计师和机器学习领域的研究人员具有一定的参考价值。摘要：Data-intensive workloads and applications, such as machine learning (ML), are fundamentally limited by traditional computing systems based on the von-Neumann architecture. As data movement operations and energy consumption become key bottlenecks in the design of computing systems, the interest in unconventional approaches such as Near-Data Processing (NDP), machine learning, and especially neural network (NN)-based accelerators has grown significantly. Emerging memory technologies, such as ReRAM and 3D-stacked, are promising for efficiently architecting NDP-based accelerators for NN due to their capabilities to work as both: High-density/low-energy storage and in/near-memory computation/search engine. In this paper, we present a survey of techniques for designing NDP architectures for NN. By classifying the techniques based on the memory technology employed, we underscore their similarities and differences. Finally, we discuss open challenges and future perspectives that need to be explored in order to improve and extend the adoption of NDP architectures for future computing platforms. This paper will be valuable for computer architects, chip designers and researchers in the area of machine learning.

【4】 Black-Box Testing of Deep Neural Networks through Test Case Diversity 标题：基于测试用例多样性的深度神经网络黑盒测试链接：https://arxiv.org/abs/2112.12591

作者：Zohreh Aghababaeyan,Manel Abdellatif,Lionel Briand,Ramesh S,Mojtaba Bagherzadeh 摘要：深度神经网络（DNN）已广泛应用于许多领域，包括图像处理、医疗诊断和自动驾驶。然而，DNN可能表现出错误行为，可能导致严重错误，尤其是在安全关键系统中使用时。受传统软件系统测试技术的启发，研究人员提出了神经元覆盖率标准，作为源代码覆盖率的类比，以指导DNN模型的测试。尽管对DNN覆盖率进行了非常积极的研究，但最近的几项研究对此类标准在指导DNN测试中的有用性提出了质疑。此外，从实际角度来看，这些标准是白盒，因为它们需要访问DNN模型的内部或训练数据，这在许多情况下是不可行或不方便的。在本文中，我们研究黑盒输入多样性度量作为白盒覆盖标准的替代。为此，我们首先选择并调整三种多样性指标，并以可控的方式研究它们测量输入集中实际多样性的能力。然后，我们使用两个数据集和三个DNN模型分析它们与故障检测的统计关联。我们进一步将多样性与最先进的白盒覆盖标准进行比较。我们的实验表明，依靠嵌入在测试输入集中的图像特征的多样性是比覆盖标准更可靠的指标，可以有效地指导DNN的测试。事实上，我们发现我们选择的一个黑盒多样性度量在故障揭示能力和计算时间方面远远优于现有的覆盖标准。结果也证实了这样的怀疑，即最先进的覆盖率指标不足以指导测试输入集的构建，从而用自然输入检测尽可能多的故障。摘要：Deep Neural Networks (DNNs) have been extensively used in many areas including image processing, medical diagnostics, and autonomous driving. However, DNNs can exhibit erroneous behaviours that may lead to critical errors, especially when used in safety-critical systems. Inspired by testing techniques for traditional software systems, researchers have proposed neuron coverage criteria, as an analogy to source code coverage, to guide the testing of DNN models. Despite very active research on DNN coverage, several recent studies have questioned the usefulness of such criteria in guiding DNN testing. Further, from a practical standpoint, these criteria are white-box as they require access to the internals or training data of DNN models, which is in many contexts not feasible or convenient. In this paper, we investigate black-box input diversity metrics as an alternative to white-box coverage criteria. To this end, we first select and adapt three diversity metrics and study, in a controlled manner, their capacity to measure actual diversity in input sets. We then analyse their statistical association with fault detection using two datasets and three DNN models. We further compare diversity with state-of-the-art white-box coverage criteria. Our experiments show that relying on the diversity of image features embedded in test input sets is a more reliable indicator than coverage criteria to effectively guide the testing of DNNs. Indeed, we found that one of our selected black-box diversity metrics far outperforms existing coverage criteria in terms of fault-revealing capability and computational time. Results also confirm the suspicions that state-of-the-art coverage metrics are not adequate to guide the construction of test input sets to detect as many faults as possible with natural inputs.

【5】 Collaborative adversary nodes learning on the logs of IoT devices in an IoT network 标题：协作敌方节点学习物联网网络中物联网设备的日志链接：https://arxiv.org/abs/2112.12546

作者：Sandhya Aneja,Melanie Ang Xuan En,Nagender Aneja 机构：†School of Digital Science, Universiti Brunei Darussalam 摘要：人工智能（AI）的发展鼓励了许多新的研究领域，包括支持AI的物联网（IoT）网络。人工智能分析和智能范例极大地提高了学习效率和准确性。将这些学习范式应用于网络场景，可以提供新网络解决方案的技术优势。在本文中，我们从数据角度提出了一种改进的物联网安全方法。物联网设备的网络流量可以使用AI技术进行分析。针对网络流量中的网络事件序列，利用具有注意机制的递归神经网络（RNN）提出了对手学习（AdLIoTLog）模型。我们将网络事件定义为日志中捕获的协议的时间序列数据包序列。我们在网络日志中考虑了不同的数据包TCP数据包、UDP数据包和HTTP数据包，以使算法具有健壮性。分布式物联网设备可以合作削弱我们正在扩展到智能互联网的世界。通过去除噪声和添加时间戳，将时间序列数据包转换为结构化数据。生成的数据集由RNN训练，可以检测相互协作的节点对。我们使用BLEU评分来评估模型性能。我们的结果表明，在存在攻击的情况下，使用我们的方法训练的AdLIoTLog模型的预测性能比在网络不受攻击的情况下下降了3-4%。AdLIoTLog可以检测到对手，因为当对手出现时，模型会被协作事件所欺骗，因此会用有偏差的事件而不是良性事件预测下一个事件。我们得出结论，人工智能可以为新一代物联网提供无处不在的学习。摘要：Artificial Intelligence (AI) development has encouraged many new research areas, including AI-enabled Internet of Things (IoT) network. AI analytics and intelligent paradigms greatly improve learning efficiency and accuracy. Applying these learning paradigms to network scenarios provide technical advantages of new networking solutions. In this paper, we propose an improved approach for IoT security from data perspective. The network traffic of IoT devices can be analyzed using AI techniques. The Adversary Learning (AdLIoTLog) model is proposed using Recurrent Neural Network (RNN) with attention mechanism on sequences of network events in the network traffic. We define network events as a sequence of the time series packets of protocols captured in the log. We have considered different packets TCP packets, UDP packets, and HTTP packets in the network log to make the algorithm robust. The distributed IoT devices can collaborate to cripple our world which is extending to Internet of Intelligence. The time series packets are converted into structured data by removing noise and adding timestamps. The resulting data set is trained by RNN and can detect the node pairs collaborating with each other. We used the BLEU score to evaluate the model performance. Our results show that the predicting performance of the AdLIoTLog model trained by our method degrades by 3-4% in the presence of attack in comparison to the scenario when the network is not under attack. AdLIoTLog can detect adversaries because when adversaries are present the model gets duped by the collaborative events and therefore predicts the next event with a biased event rather than a benign event. We conclude that AI can provision ubiquitous learning for the new generation of Internet of Things.

【6】 FourierMask: Instance Segmentation using Fourier Mapping in Implicit Neural Networks 标题：傅立叶掩码：隐式神经网络中基于傅立叶映射的实例分割链接：https://arxiv.org/abs/2112.12535

作者：Hamd ul Moqeet Riaz,Nuri Benbarka,Timon Hoeffer,Andreas Zell 机构：Department of Computer Science (WSI), University of Tuebingen, Germany 摘要：我们提出了FourierMask，它采用Fourier级数和隐式神经表示相结合来生成实例分割mask。我们将傅里叶映射（FM）应用于坐标位置，并将映射的特征作为隐式表示（基于坐标的多层感知器（MLP））的输入。FourierMask学习预测特定实例的FM系数，从而使FM适应特定对象。这允许对FourierMask进行推广，以从自然图像预测实例分割掩码。由于隐函数在输入坐标域中是连续的，我们说明了通过对输入像素坐标进行亚采样，我们可以在推理过程中生成更高分辨率的掩码。此外，我们还针对FourierMask的不确定预测训练了一个渲染器MLP（FourierRend），并说明它显著提高了掩模的质量。FourierMask在MS COCO数据集上显示出与基线Mask R-CNN在相同输出分辨率下具有竞争力的结果，并且在更高分辨率上优于基线Mask R-CNN。摘要：We present FourierMask, which employs Fourier series combined with implicit neural representations to generate instance segmentation masks. We apply a Fourier mapping (FM) to the coordinate locations and utilize the mapped features as inputs to an implicit representation (coordinate-based multi-layer perceptron (MLP)). FourierMask learns to predict the coefficients of the FM for a particular instance, and therefore adapts the FM to a specific object. This allows FourierMask to be generalized to predict instance segmentation masks from natural images. Since implicit functions are continuous in the domain of input coordinates, we illustrate that by sub-sampling the input pixel coordinates, we can generate higher resolution masks during inference. Furthermore, we train a renderer MLP (FourierRend) on the uncertain predictions of FourierMask and illustrate that it significantly improves the quality of the masks. FourierMask shows competitive results on the MS COCO dataset compared to the baseline Mask R-CNN at the same output resolution and surpasses it on higher resolution.

【7】 PyCIL: A Python Toolbox for Class-Incremental Learning 标题：PyCIL：一个用于课堂增量学习的Python工具箱链接：https://arxiv.org/abs/2112.12533

作者：Da-Wei Zhou,Fu-Yun Wang,Han-Jia Ye,De-Chuan Zhan 机构：State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing , China 备注：Technical report. Code is available at this https URL 摘要：传统的机器学习系统是在封闭世界环境下部署的，在离线训练过程之前需要完整的训练数据。然而，现实世界中的应用程序经常面临新的类，模型应该不断地合并它们。这种学习范式称为课堂增量学习（CIL）。我们提出了一个Python工具箱，它实现了几个用于类增量学习的关键算法，以减轻机器学习社区研究人员的负担。工具箱包含了CIL的许多创始工作的实现，如EWC和iCaRL，但也提供了当前最先进的算法，可用于进行新的基础研究。这个名为PyCIL for Python类增量学习的工具箱位于https://github.com/G-U-N/PyCIL 摘要：Traditional machine learning systems are deployed under the closed-world setting, which requires the entire training data before the offline training process. However, real-world applications often face the incoming new classes, and a model should incorporate them continually. The learning paradigm is called Class-Incremental Learning (CIL). We propose a Python toolbox that implements several key algorithms for class-incremental learning to ease the burden of researchers in the machine learning community. The toolbox contains implementations of a number of founding works of CIL such as EWC and iCaRL, but also provides current state-of-the-art algorithms that can be used for conducting novel fundamental research. This toolbox, named PyCIL for Python Class-Incremental Learning, is available at https://github.com/G-U-N/PyCIL

【8】 Curriculum Learning for Safe Mapless Navigation 标题：关于安全无人驾驶的课程学习链接：https://arxiv.org/abs/2112.12490

作者：Luca Marzari,Davide Corsi,Enrico Marchesini,Alessandro Farinelli 机构：Computer Science Department, University of Verona, Verona, Italy 备注：8 pages, 5 figures. The poster version of this paper has been accepted by The 37th ACM/SIGAPP Symposium on Applied Computing Proceedings (SAC IRMAS 2022) 摘要：这项工作调查了基于课程学习（CL）的方法对代理绩效的影响。我们特别关注mapless机器人导航的安全方面，与标准端到端（E2E）训练策略进行比较。为此，我们提出了一种CL方法，利用基于统一的模拟中的学习转移（ToL）和微调，以Robotnik Kairos作为机器人代理。为了进行公平比较，我们的评估考虑了每种学习方法的同等计算需求（即，相同数量的交互和环境难度），并确认我们基于CL的使用ToL的方法优于E2E方法。特别是，我们提高了平均成功率和经过训练的策略的安全性，从而在看不见的测试场景中减少了10%的冲突。为了进一步证实这些结果，我们使用了一个正式的验证工具来量化强化学习策略在期望规范下的正确行为数量。摘要：This work investigates the effects of Curriculum Learning (CL)-based approaches on the agent's performance. In particular, we focus on the safety aspect of robotic mapless navigation, comparing over a standard end-to-end (E2E) training strategy. To this end, we present a CL approach that leverages Transfer of Learning (ToL) and fine-tuning in a Unity-based simulation with the Robotnik Kairos as a robotic agent. For a fair comparison, our evaluation considers an equal computational demand for every learning approach (i.e., the same number of interactions and difficulty of the environments) and confirms that our CL-based method that uses ToL outperforms the E2E methodology. In particular, we improve the average success rate and the safety of the trained policy, resulting in 10% fewer collisions in unseen testing scenarios. To further confirm these results, we employ a formal verification tool to quantify the number of correct behaviors of Reinforcement Learning policies over desired specifications.

【9】 Hierarchical Multi-Building And Multi-Floor Indoor Localization Based On Recurrent Neural Networks 标题：基于递归神经网络的分层多建筑多层室内定位链接：https://arxiv.org/abs/2112.12478

作者：Abdalla Elmokhtar Ahmed Elesawi,Kyeong Soo Kim 机构：School of Advanced Technology, Xi’an Jiaotong-Liverpool University, Suzhou, P. R. China. 备注：4 pages, 3 figures, the 6th International Workshop on GPU Computing and AI (GCA'21) 摘要：在现代城市中，人们越来越倾向于从户外生活方式转向室内生活方式。大型购物中心、室内运动场馆、工厂和仓库的出现加速了这一趋势。在这样的环境中，室内定位成为一项基本服务，而要部署的室内定位系统应具有足够的可扩展性，以覆盖这些室内设施的预期扩展。最经济实用的室内定位方法之一是Wi-Fi指纹识别，它利用广泛部署的Wi-Fi网络，使用移动设备（如智能手机），而无需对现有基础设施进行任何修改。传统的Wi-Fi指纹识别方案依赖于复杂的数据预/后处理和耗时的手动参数调整。在本文中，我们提出了基于递归神经网络（RNN）的分层多建筑物和多楼层室内定位方法，该方法使用Wi-Fi指纹识别，消除了复杂数据预/后处理的需要，并且参数调整较少。建议方案中的RNN以从一般到特定（例如，建筑物->楼层->位置）的顺序方式估计位置，以便利用多建筑物和多楼层环境中本地化的层次性。UJIIndoorLoc数据集的实验结果表明，该方案估计建筑物和地板的精度分别为100%和95.24%，三维定位误差为8.62m，优于现有的基于深度神经网络的方案。摘要：There has been an increasing tendency to move from outdoor to indoor lifestyle in modern cities. The emergence of big shopping malls, indoor sports complexes, factories, and warehouses is accelerating this tendency. In such an environment, indoor localization becomes one of the essential services, and the indoor localization systems to be deployed should be scalable enough to cover the expected expansion of those indoor facilities. One of the most economical and practical approaches to indoor localization is Wi-Fi fingerprinting, which exploits the widely-deployed Wi-Fi networks using mobile devices (e.g., smartphones) without any modification of the existing infrastructure. Traditional Wi-Fi fingerprinting schemes rely on complicated data pre/post-processing and time-consuming manual parameter tuning. In this paper, we propose hierarchical multi-building and multi-floor indoor localization based on a recurrent neural network (RNN) using Wi-Fi fingerprinting, eliminating the need of complicated data pre/post-processing and with less parameter tuning. The RNN in the proposed scheme estimates locations in a sequential manner from a general to a specific one (e.g., building->floor->location) in order to exploit the hierarchical nature of the localization in multi-building and multi-floor environments. The experimental results with the UJIIndoorLoc dataset demonstrate that the proposed scheme estimates building and floor with 100% and 95.24% accuracy, respectively, and provides three-dimensional positioning error of 8.62 m, which outperforms existing deep neural network-based schemes.

【10】 Learning with distributional inverters 标题：用分布式逆变器学习链接：https://arxiv.org/abs/2112.12340

作者：Eric Binnendyk,Marco Carmosino,Antonina Kolokolova,Ramyaa Ramyaa,Manuel Sabin 摘要：我们推广了Furst et.al.，1991的“间接学习”技术，将可抽样分布$\mu$上的概念类学习简化为均匀分布上的相同概念类学习。当$\mu$的采样器既包含在目标概念类中，又在Impagliazzo&Luby，1989的意义上是有效可逆的时，减少成功。我们提出两份申请我们证明了AC0[q]在任何简洁描述的产品分布上都是可学习的。AC0[q]是一类多项式大小的等深布尔电路，具有无界fanins的AND、OR、NOT和计数模$q$门。我们的算法在随机准多项式时间内运行，并使用成员查询如果在Razborov&Rudich 1997的意义上存在一个非常有用的自然属性——一种能够区分随机字符串和具有非平凡电路复杂性的字符串的有效算法——那么一般多项式大小的布尔电路可以在随机多项式时间内的任何有效可抽样分布上学习，给定目标函数的成员资格查询摘要：We generalize the "indirect learning" technique of Furst et. al., 1991 to reduce from learning a concept class over a samplable distribution $\mu$ to learning the same concept class over the uniform distribution. The reduction succeeds when the sampler for $\mu$ is both contained in the target concept class and efficiently invertible in the sense of Impagliazzo & Luby, 1989. We give two applications. - We show that AC0[q] is learnable over any succinctly-described product distribution. AC0[q] is the class of constant-depth Boolean circuits of polynomial size with AND, OR, NOT, and counting modulo $q$ gates of unbounded fanins. Our algorithm runs in randomized quasi-polynomial time and uses membership queries. - If there is a strongly useful natural property in the sense of Razborov & Rudich 1997 -- an efficient algorithm that can distinguish between random strings and strings of non-trivial circuit complexity -- then general polynomial-sized Boolean circuits are learnable over any efficiently samplable distribution in randomized polynomial time, given membership queries to the target function

【11】 Physics Constrained Flow Neural Network for Short-Timescale Predictions in Data Communications Networks 标题：用于数据通信网短时间尺度预测的物理约束流神经网络链接：https://arxiv.org/abs/2112.12321

作者：Xiangle Cheng,James He,Shihan Xiao,Yingxue Zhang,Zhitang Chen,Pascal Poupart,Fenglin Li 机构： Huawei , Network Technology Lab, University of Waterloo, Huawei Technologies Canada, Huawei Noahs Ark Lab 摘要：机器学习在数据通信网络中信息流动态分析的各种最新模型中获得了越来越大的发展势头。这些初步模型通常依赖现成的学习模型来根据历史统计数据进行预测，而忽略了控制这些流生成行为的物理因素。本文引入了Flow神经网络（FlowNN）来改进具有学习物理偏差的特征表示。这是由一个感应层实现的，在嵌入层上工作，以施加物理连接的数据相关性，以及一个带有停止梯度的自监督学习策略，以使所学的物理具有普遍性。对于短时间尺度网络预测任务，FlowNN在合成和真实网络数据集上实现了比最先进基线17%-71%的损耗减少，这表明了这种新方法的优势。代码将可用。摘要：Machine learning is gaining growing momentum in various recent models for the dynamic analysis of information flows in data communications networks. These preliminary models often rely on off-the-shelf learning models to predict from historical statistics while disregarding the physics governing the generating behaviors of these flows. This paper instead introduces Flow Neural Network (FlowNN) to improve the feature representation with learned physical bias. This is implemented by an induction layer, working upon the embedding layer, to impose the physics connected data correlations, and a self-supervised learning strategy with stop-gradient to make the learned physics universal. For the short-timescale network prediction tasks, FlowNN achieves 17% - 71% of loss decrease than the state-of-the-art baselines on both synthetic and real-world networking datasets, which shows the strength of this new approach. Code will be made available.

【12】 Learning with Proper Partial Labels 标题：用适当的部分标号学习链接：https://arxiv.org/abs/2112.12303

作者：Zhenguo Wu,Masashi Sugiyama 机构： The University of Tokyo , RIKEN AIP 摘要：部分标签学习是一种具有不精确标签的弱监督学习，其中对于每个训练示例，我们只给出一组候选标签，而不是一个真正的标签。最近，在不同的候选标签集生成模型下，人们提出了不同的部分标签学习方法。然而，这些方法需要对发电模型进行相对较强的分布假设。当假设不成立时，理论上无法保证方法的性能。在本文中，我们提出了部分标签上的适当性的概念。我们证明了这个适当的部分标签学习框架包括许多以前的部分标签学习设置作为特例。然后，我们推导了分类风险的统一无偏估计。通过得到估计误差界，证明了我们的估计量是风险一致的。最后，通过实验验证了算法的有效性。摘要：Partial-label learning is a kind of weakly-supervised learning with inexact labels, where for each training example, we are given a set of candidate labels instead of only one true label. Recently, various approaches on partial-label learning have been proposed under different generation models of candidate label sets. However, these methods require relatively strong distributional assumptions on the generation models. When the assumptions do not hold, the performance of the methods is not guaranteed theoretically. In this paper, we propose the notion of properness on partial labels. We show that this proper partial-label learning framework includes many previous partial-label learning settings as special cases. We then derive a unified unbiased estimator of the classification risk. We prove that our estimator is risk-consistent by obtaining its estimation error bound. Finally, we validate the effectiveness of our algorithm through experiments.

【13】 Batch Processing and Data Streaming Fourier-based Convolutional Neural Network Accelerator 标题：基于傅里叶变换的批处理和数据流卷积神经网络加速器链接：https://arxiv.org/abs/2112.12297

作者：Zibo Hu,Shurui Li,Russell L. T. Schwartz,Maria Solyanik-Gorgone,Mario Miscuglio,Puneet Gupta,Volker J. Sorger 机构：Department of Electrical and Computer Engineering, George Washington University, Washington DC, USA., Department of Electrical and Computer Engineering, University of California, Los Angeles, California , USA. 备注：13 pages, 4 figures 摘要：在导航、跟踪和实时机器动作系统等众多应用中，人工神经网络以最小延迟做出决策至关重要。这需要机器学习硬件以高吞吐量处理多维数据。处理卷积运算是数据分类任务的主要计算工具，不幸的是，它遵循一个具有挑战性的运行时复杂度缩放定律。然而，在傅里叶光学显示光处理器中同态实现卷积定理，对于超过1000 x 1000大矩阵的数据输入，可实现非迭代O（1）运行时复杂性。根据这种方法，我们在这里演示了使用傅立叶卷积神经网络（FCNN）加速器的数据流多核图像批处理。我们将大规模矩阵的图像批处理显示为由数字光处理模块在傅里叶域中执行的被动200万点积乘法。此外，我们通过利用多个空间平行衍射级进一步并行化该光学FCNN系统，从而实现了比最先进的FCNN加速器98倍的吞吐量改进。全面讨论了与在系统能力边缘工作相关的实际挑战，突出了傅里叶域和分辨率缩放定律中的串扰问题。利用显示技术中的大规模并行性加速卷积，提出了一种基于非van Neuman的机器学习加速算法。摘要：Decision-making by artificial neural networks with minimal latency is paramount for numerous applications such as navigation, tracking, and real-time machine action systems. This requires the machine learning hardware to handle multidimensional data with a high throughput. Processing convolution operations being the major computational tool for data classification tasks, unfortunately, follows a challenging run-time complexity scaling law. However, implementing the convolution theorem homomorphically in a Fourier-optic display-light-processor enables a non-iterative O(1) runtime complexity for data inputs beyond 1,000 x 1,000 large matrices. Following this approach, here we demonstrate data streaming multi-kernel image batch-processing with a Fourier Convolutional Neural Network (FCNN) accelerator. We show image batch processing of large-scale matrices as passive 2-million dot-product multiplications performed by digital light-processing modules in the Fourier domain. In addition, we parallelize this optical FCNN system further by utilizing multiple spatio-parallel diffraction orders, thus achieving a 98-times throughput improvement over state-of-art FCNN accelerators. The comprehensive discussion of the practical challenges related to working on the edge of the system's capabilities highlights issues of crosstalk in the Fourier domain and resolution scaling laws. Accelerating convolutions by utilizing the massive parallelism in display technology brings forth a non-van Neuman-based machine learning acceleration.

【14】 Algorithmic Probability of Large Datasets and the Simplicity Bubble Problem in Machine Learning 标题：大数据集的算法概率与机器学习中的简单性泡沫问题链接：https://arxiv.org/abs/2112.12275

作者：Felipe S. Abrahão,Hector Zenil,Fabio Porto,Klaus Wehmuth 机构：oratory for Scientific Computing (LNCC),-, Petr´opolis, RJ, Brazil., for the Natural and Digital Sciences, Paris, France., The Alan Turing Institute, British Library,QR, Euston Rd, Lon-, don NW,DB. Algorithmic Dynamics Lab, Unit of Computational 摘要：在挖掘大型数据集以预测新数据时，统计机器学习背后原理的局限性不仅对大数据泛滥构成了严重挑战，也对数据生成过程偏向于低算法复杂性的传统假设构成了严重挑战。即使假设在有限数据集生成器中存在一种潜在的算法信息偏向于简单性，我们也表明，无论是否使用伪随机生成器，完全自动化的可计算学习算法，特别是当前机器学习（包括深度学习）方法中使用的统计性质的算法，总是会被足够大的数据集自然或人为地欺骗。特别是，我们证明，对于每个有限学习算法，都有一个足够大的数据集大小，超过该数据集，不可预测欺骗者的算法概率是任何其他较大数据集的算法概率的上界（最多一个乘法常数，仅取决于学习算法）。换句话说，与任何其他特定数据集一样，非常大和复杂的数据集也可能将学习算法欺骗成“简单泡沫”。这些欺骗性的数据集保证了任何预测都会偏离高算法复杂度的全局最优解，同时收敛到低算法复杂度的局部最优解。我们讨论了规避这种欺骗性现象的框架和经验条件，从统计机器学习转向基于算法信息理论和可计算性理论的内在力量或受其驱动的更强类型的机器学习。摘要：When mining large datasets in order to predict new data, limitations of the principles behind statistical machine learning pose a serious challenge not only to the Big Data deluge, but also to the traditional assumptions that data generating processes are biased toward low algorithmic complexity. Even when one assumes an underlying algorithmic-informational bias toward simplicity in finite dataset generators, we show that fully automated, with or without access to pseudo-random generators, computable learning algorithms, in particular those of statistical nature used in current approaches to machine learning (including deep learning), can always be deceived, naturally or artificially, by sufficiently large datasets. In particular, we demonstrate that, for every finite learning algorithm, there is a sufficiently large dataset size above which the algorithmic probability of an unpredictable deceiver is an upper bound (up to a multiplicative constant that only depends on the learning algorithm) for the algorithmic probability of any other larger dataset. In other words, very large and complex datasets are as likely to deceive learning algorithms into a "simplicity bubble" as any other particular dataset. These deceiving datasets guarantee that any prediction will diverge from the high-algorithmic-complexity globally optimal solution while converging toward the low-algorithmic-complexity locally optimal solution. We discuss the framework and empirical conditions for circumventing this deceptive phenomenon, moving away from statistical machine learning towards a stronger type of machine learning based on, or motivated by, the intrinsic power of algorithmic information theory and computability theory.

【15】 ProBF: Learning Probabilistic Safety Certificates with Barrier Functions 标题：ProBF：使用屏障功能学习概率安全证书链接：https://arxiv.org/abs/2112.12210

作者：Sulin Liu,Athindran Ramesh Kumar,Jaime F. Fisac,Ryan P. Adams,Peter J. Ramadge 机构：Princeton University 备注：Presented at NeurIPS 2021 workshop - Safe and Robust Control of Uncertain Systems 摘要：安全关键型应用要求控制器/策略能够以高度的可信度保证安全。如果我们能够了解地面真实系统动力学，控制屏障功能是保证安全的有用工具。在实践中，我们对系统动力学的知识不准确，这可能会由于未建模的剩余动力学而导致不安全的行为。用确定性机器学习模型学习剩余动态可以防止不安全行为，但当预测不完善时可能失败。在这种情况下，对其预测的不确定性进行推理的概率学习方法有助于提供稳健的安全裕度。在这项工作中，我们使用高斯过程来模拟剩余动力学在控制势垒函数上的投影。我们提出了一种新的优化程序来生成能够以高概率保证安全的安全控制。安全过滤器能够对来自GP的预测的不确定性进行推理。我们通过Segway和Quadrotor仿真实验证明了该方法的有效性。与神经网络确定性方法相比，我们提出的概率方法能够显著减少安全违规的数量。摘要：Safety-critical applications require controllers/policies that can guarantee safety with high confidence. The control barrier function is a useful tool to guarantee safety if we have access to the ground-truth system dynamics. In practice, we have inaccurate knowledge of the system dynamics, which can lead to unsafe behaviors due to unmodeled residual dynamics. Learning the residual dynamics with deterministic machine learning models can prevent the unsafe behavior but can fail when the predictions are imperfect. In this situation, a probabilistic learning method that reasons about the uncertainty of its predictions can help provide robust safety margins. In this work, we use a Gaussian process to model the projection of the residual dynamics onto a control barrier function. We propose a novel optimization procedure to generate safe controls that can guarantee safety with high probability. The safety filter is provided with the ability to reason about the uncertainty of the predictions from the GP. We show the efficacy of this method through experiments on Segway and Quadrotor simulations. Our proposed probabilistic approach is able to reduce the number of safety violations significantly as compared to the deterministic approach with a neural network.

【16】 Deep Filtering with DNN, CNN and RNN 标题：基于DNN、CNN和RNN的深度过滤链接：https://arxiv.org/abs/2112.12616

作者：Bin Xie,Qing Zhang 机构： University of Georgia 备注：arXiv admin note: text overlap with arXiv:2008.03878 摘要：本文是关于线性和非线性滤波的深度学习方法。其思想是用从名义动态模型生成的蒙特卡罗样本训练神经网络。然后将网络权重应用于实际动态模型的蒙特卡罗样本。本文主要研究三种主要神经网络结构（DNN、CNN、RNN）的深度滤波器。我们的深度滤波器在线性情况下优于传统卡尔曼滤波器，在非线性情况下优于扩展卡尔曼滤波器。然后，研究了一个带跳跃的切换模型，以显示我们的深度滤波的适应性和能力。在三大NN中，CNN的平均表现优于其他NN。而RNN似乎不适用于过滤问题。当标称模型和实际模型不同时，深度滤波器的一个优点是其鲁棒性。深度滤波的另一个优点是，真实数据可以直接用于训练深度神经网络。因此，模型校准可以一起通过。摘要：This paper is about a deep learning approach for linear and nonlinear filtering. The idea is to train a neural network with Monte Carlo samples generated from a nominal dynamic model. Then the network weights are applied to Monte Carlo samples from an actual dynamic model. A main focus of this paper is on the deep filters with three major neural network architectures (DNN, CNN, RNN). Our deep filter compares favorably to the traditional Kalman filter in linear cases and outperform the extended Kalman filter in nonlinear cases. Then a switching model with jumps is studied to show the adaptiveness and power of our deep filtering. Among the three major NNs, the CNN outperform the others on average. while the RNN does not seem to be suitable for the filtering problem. One advantage of the deep filter is its robustness when the nominal model and actual model differ. The other advantage of deep filtering is real data can be used directly to train the deep neutral network. Therefore, model calibration can be by-passed all together.

【17】 Equivariance and generalization in neural networks 标题：神经网络中的等方差与泛化链接：https://arxiv.org/abs/2112.12493

作者：Srinath Bulusu,Matteo Favoni,Andreas Ipp,David I. Müller,Daniel Schuh 机构：Schuh,†, Institute for Theoretical Physics, TU Wien, Wiedner Hauptstr. ,-, Vienna, Austria, Speaker and corresponding author 备注：8 pages, 7 figures, proceedings for the 14th Quark Confinement and the Hadron Spectrum Conference (vConf2021) 摘要：高能物理和晶格场理论的基本对称性所起的关键作用要求在应用于所考虑的物理系统的神经网络体系结构中实现这种对称性。在这些会议中，我们重点讨论了在网络属性中引入翻译等价性的后果，特别是在性能和泛化方面。通过研究复标量场理论，证明了等变网络的优点，在此基础上研究了各种回归和分类任务。为了进行有意义的比较，通过系统搜索确定了有前途的等变和非等变体系结构。结果表明，在大多数任务中，我们最好的等变体系结构比其非等变体系结构的性能和泛化能力要好得多，这不仅适用于训练集中表示的物理参数，而且适用于不同的晶格尺寸。摘要：The crucial role played by the underlying symmetries of high energy physics and lattice field theories calls for the implementation of such symmetries in the neural network architectures that are applied to the physical system under consideration. In these proceedings, we focus on the consequences of incorporating translational equivariance among the network properties, particularly in terms of performance and generalization. The benefits of equivariant networks are exemplified by studying a complex scalar field theory, on which various regression and classification tasks are examined. For a meaningful comparison, promising equivariant and non-equivariant architectures are identified by means of a systematic search. The results indicate that in most of the tasks our best equivariant architectures can perform and generalize significantly better than their non-equivariant counterparts, which applies not only to physical parameters beyond those represented in the training set, but also to different lattice sizes.

【18】 Generalization capabilities of neural networks in lattice applications 标题：格型应用中神经网络的泛化能力链接：https://arxiv.org/abs/2112.12474

作者：Srinath Bulusu,Matteo Favoni,Andreas Ipp,David I. Müller,Daniel Schuh 机构： SchuhInstitute for Theoretical Physics, Massachusetts Institute of Technology 备注：10 pages, 7 figures, proceedings for the 38th International Symposium on Lattice Field Theory (LATTICE21) 摘要：近年来，机器学习在格场理论中的应用越来越广泛。这种理论的一个基本要素是对称性，对称性包含在神经网络属性中，可以在性能和可推广性方面带来高回报。具有周期边界条件的晶格上的物理系统通常具有的一个基本对称性是时空平移下的等变。在这里，我们调查的优势，采用翻译等变神经网络有利于非等变的。我们考虑的系统是一个复杂的标量场，在通量表示的二维格子上具有四次相互作用，其中网络执行各种回归和分类任务。通过系统搜索，确定了有前途的等变和非等变体系结构。我们证明，在大多数这些任务中，我们最好的等变体系结构比非等变体系结构的性能和通用性要好得多，这不仅适用于训练集中表示的物理参数，也适用于不同的晶格尺寸。摘要：In recent years, the use of machine learning has become increasingly popular in the context of lattice field theories. An essential element of such theories is represented by symmetries, whose inclusion in the neural network properties can lead to high reward in terms of performance and generalizability. A fundamental symmetry that usually characterizes physical systems on a lattice with periodic boundary conditions is equivariance under spacetime translations. Here we investigate the advantages of adopting translationally equivariant neural networks in favor of non-equivariant ones. The system we consider is a complex scalar field with quartic interaction on a two-dimensional lattice in the flux representation, on which the networks carry out various regression and classification tasks. Promising equivariant and non-equivariant architectures are identified with a systematic search. We demonstrate that in most of these tasks our best equivariant architectures can perform and generalize significantly better than their non-equivariant counterparts, which applies not only to physical parameters beyond those represented in the training set, but also to different lattice sizes.

【19】 Calabi-Yau Metrics, Energy Functionals and Machine-Learning 标题：Calabi-Yau度量、能量泛函与机器学习链接：https://arxiv.org/abs/2112.10872

作者：Anthony Ashmore,Lucille Calmon,Yang-Hui He,Burt A. Ovrut 机构：Enrico Fermi Institute & Kadanoff Center for Theoretical Physics, University of Chicago, IL , USA, Sorbonne Universit´e, Laboratoire de Physique Th´eorique et Hautes Energies, F-, Paris, France 备注：7 pages, 5 figures 摘要：我们将机器学习应用于寻找数值Calabi-Yau度量的问题。我们将以前关于学习使用Donaldson算法计算的近似Ricci平坦度量的工作扩展到Headlick和Nassar更精确的“最优”度量。我们证明了机器学习能够预测Calabi-Yau度量的K \“ahler势，因为只看到了一小部分训练数据。摘要：We apply machine learning to the problem of finding numerical Calabi-Yau metrics. We extend previous work on learning approximate Ricci-flat metrics calculated using Donaldson's algorithm to the much more accurate "optimal" metrics of Headrick and Nassar. We show that machine learning is able to predict the K\"ahler potential of a Calabi-Yau metric having seen only a small sample of training data.

其他(11篇)

【1】 Latent Time Neural Ordinary Differential Equations 标题：潜伏期神经元常微分方程链接：https://arxiv.org/abs/2112.12728

作者：Srinivas Anumasa,P. K. Srijith 机构：Computer Science and Engineering, Indian Institute of Technology Hyderabad, India 备注：Accepted at AAAI-2022 摘要：神经常微分方程（NODE）是对残差网络（ResNets）等常用深度学习模型的一种连续深度推广。它们提供了参数效率，并在一定程度上自动化了深度学习模型中的模型选择过程。然而，它们缺乏非常必要的不确定性建模和鲁棒性能力，这对于它们在一些实际应用中的使用至关重要，如自动驾驶和医疗保健。我们提出了一种新颖独特的方法，通过考虑ODE解算器在结束时间$T$上的分布来建模节点中的不确定性。所提出的潜在时间节点（LT-NODE）方法将$T$作为潜在变量，并应用贝叶斯学习从数据中获得$T$的后验分布。特别地，我们使用变分推理来学习近似的后验概率和模型参数。通过考虑来自不同后验样本的节点表示来进行预测，并且可以使用单个前向过程有效地进行预测。由于$T$隐式定义了节点的深度，因此$T$上的后验分布也有助于节点中的模型选择。我们还提出了一种自适应潜在时间节点（ALT-NODE），它允许每个数据点在结束时间上具有明显的后验分布。ALT-NODE使用分期变分推理来学习使用推理网络的近似后验概率。通过对合成图像分类数据和若干真实图像分类数据的实验，我们证明了所提出的方法在建模不确定性和鲁棒性方面的有效性。摘要：Neural ordinary differential equations (NODE) have been proposed as a continuous depth generalization to popular deep learning models such as Residual networks (ResNets). They provide parameter efficiency and automate the model selection process in deep learning models to some extent. However, they lack the much-required uncertainty modelling and robustness capabilities which are crucial for their use in several real-world applications such as autonomous driving and healthcare. We propose a novel and unique approach to model uncertainty in NODE by considering a distribution over the end-time $T$ of the ODE solver. The proposed approach, latent time NODE (LT-NODE), treats $T$ as a latent variable and apply Bayesian learning to obtain a posterior distribution over $T$ from the data. In particular, we use variational inference to learn an approximate posterior and the model parameters. Prediction is done by considering the NODE representations from different samples of the posterior and can be done efficiently using a single forward pass. As $T$ implicitly defines the depth of a NODE, posterior distribution over $T$ would also help in model selection in NODE. We also propose, adaptive latent time NODE (ALT-NODE), which allow each data point to have a distinct posterior distribution over end-times. ALT-NODE uses amortized variational inference to learn an approximate posterior using inference networks. We demonstrate the effectiveness of the proposed approaches in modelling uncertainty and robustness through experiments on synthetic and several real-world image classification data.

【2】 Preprocessing in Inductive Logic Programming 标题：归纳逻辑程序设计中的预处理链接：https://arxiv.org/abs/2112.12551

作者：Brad Hunter 机构：Linacre College, University of Oxford, A dissertation submitted for the degree of, Master of Mathematics and Foundations of Computer Science, arXiv:,.,v, [cs.LG] , Dec 备注：91 pages, 6 figures, Masters thesis 摘要：归纳逻辑编程是一种机器学习，其中逻辑程序从示例中学习。这种学习通常与作为逻辑程序提供的一些背景知识有关。本文介绍了底层预处理，一种ILP系统必须考虑的初始约束生成方法。底部预处理将逆蕴涵的思想应用于现代ILP系统。逆蕴涵是Progol引入的一种有影响力的早期ILP方法。本文还介绍了$\bot$-Popper，它是现代ILP系统Popper底层预处理的一种实现。实验表明，底层预处理可以减少ILP系统在困难问题上的学习时间。当问题中的背景知识量很大时，这种减少可能特别显著。摘要：Inductive logic programming is a type of machine learning in which logic programs are learned from examples. This learning typically occurs relative to some background knowledge provided as a logic program. This dissertation introduces bottom preprocessing, a method for generating initial constraints on the programs an ILP system must consider. Bottom preprocessing applies ideas from inverse entailment to modern ILP systems. Inverse entailment is an influential early ILP approach introduced with Progol. This dissertation also presents $\bot$-Popper, an implementation of bottom preprocessing for the modern ILP system Popper. It is shown experimentally that bottom preprocessing can reduce learning times of ILP systems on hard problems. This reduction can be especially significant when the amount of background knowledge in the problem is large.

【3】 Emulation of greenhouse-gas sensitivities using variational autoencoders 标题：利用变分自动编码器模拟温室气体敏感性链接：https://arxiv.org/abs/2112.12524

作者：Laura Cartwright,Andrew Zammit-Mangion,Nicholas M. Deutscher 机构：School of Mathematics and Applied Statistics, University of Wollongong, Wollongong, Centre for Atmospheric Chemistry, School of Earth, Atmospheric and Life Sciences, University of Wollongong, Wollongong, Australia 备注：25 pages, 8 figures, 2 tables, data & code available 摘要：通量反演是通过观测气体摩尔分数来确定气体源和汇的过程。反演通常涉及运行拉格朗日粒子色散模型（LPDM），以在感兴趣的空间域上生成观测值和通量之间的灵敏度。对于每次气体测量，LPDM必须在时间上向后运行，这在计算上是不允许的。为了解决这个问题，我们开发了一种新的LPDM灵敏度时空模拟器，该模拟器使用卷积变分自动编码器（CVAE）构建。利用CVAE的编码段，我们得到了低维空间中潜在变量的近似（变分）后验分布。然后，我们在低维空间上使用时空高斯过程模拟器来模拟预测位置和时间点的新变量。仿真变量然后通过CVAE的解码器段传递，以产生仿真灵敏度。我们证明了基于CVAE的仿真器优于使用经验正交函数构建的更传统的仿真器，并且它可以用于不同的LPDM。我们的结论是，我们基于仿真的方法可以可靠地减少生成用于高分辨率通量反演的LPDM输出所需的计算时间。摘要：Flux inversion is the process by which sources and sinks of a gas are identified from observations of gas mole fraction. The inversion often involves running a Lagrangian particle dispersion model (LPDM) to generate sensitivities between observations and fluxes over a spatial domain of interest. The LPDM must be run backward in time for every gas measurement, and this can be computationally prohibitive. To address this problem, here we develop a novel spatio-temporal emulator for LPDM sensitivities that is built using a convolutional variational autoencoder (CVAE). With the encoder segment of the CVAE, we obtain approximate (variational) posterior distributions over latent variables in a low-dimensional space. We then use a spatio-temporal Gaussian process emulator on the low-dimensional space to emulate new variables at prediction locations and time points. Emulated variables are then passed through the decoder segment of the CVAE to yield emulated sensitivities. We show that our CVAE-based emulator outperforms the more traditional emulator built using empirical orthogonal functions and that it can be used with different LPDMs. We conclude that our emulation-based approach can be used to reliably reduce the computing time needed to generate LPDM outputs for use in high-resolution flux inversions.

【4】 Using Sequential Statistical Tests to Improve the Performance of Random Search in hyperparameter Tuning 标题：利用序贯统计检验提高随机搜索在超参数整定中的性能链接：https://arxiv.org/abs/2112.12438

作者：Philip Buczak,Daniel Horn 机构：Department of Statistics, TU Dortmund University, Vogelpothsweg , Dortmund, Germany, Editor: 摘要：超参数调整是机器学习中最耗时的部分之一：必须评估大量不同超参数设置的性能，以找到最佳设置。尽管存在将所需评估次数最小化的现代优化算法，但单个设置的评估仍然昂贵：使用重采样技术，机器学习方法必须在不同的训练数据集上拟合固定次数的$K$。作为设置性能的估计器，使用$K$拟合的相应平均值。许多超参数设置在不到$K$的重采样迭代后可能会被丢弃，因为它们明显低于高性能设置。然而，在实践中，重采样通常执行到最后，浪费了大量的计算工作。我们建议使用顺序测试程序来最小化重采样迭代次数，以检测较差的参数设置。为此，我们首先分析了重采样误差的分布，我们会发现，对数正态分布是有希望的。然后，我们建立了一个假设这种分布的顺序测试程序。该顺序测试程序在随机搜索算法中使用。在一些实际数据情况下，我们比较了标准随机搜索和增强的顺序随机搜索。可以证明，顺序随机搜索能够找到相对较好的超参数设置，但是，找到这些设置所需的计算时间大约减少了一半。摘要：Hyperparamter tuning is one of the the most time-consuming parts in machine learning: The performance of a large number of different hyperparameter settings has to be evaluated to find the best one. Although modern optimization algorithms exist that minimize the number of evaluations needed, the evaluation of a single setting is still expensive: Using a resampling technique, the machine learning method has to be fitted a fixed number of $K$ times on different training data sets. As an estimator for the performance of the setting the respective mean value of the $K$ fits is used. Many hyperparameter settings could be discarded after less than $K$ resampling iterations, because they already are clearly inferior to high performing settings. However, in practice, the resampling is often performed until the very end, wasting a lot of computational effort. We propose to use a sequential testing procedure to minimize the number of resampling iterations to detect inferior parameter setting. To do so, we first analyze the distribution of resampling errors, we will find out, that a log-normal distribution is promising. Afterwards, we build a sequential testing procedure assuming this distribution. This sequential test procedure is utilized within a random search algorithm. We compare a standard random search with our enhanced sequential random search in some realistic data situation. It can be shown that the sequential random search is able to find comparably good hyperparameter settings, however, the computational time needed to find those settings is roughly halved.

【5】 Sparse-softmax: A Simpler and Faster Alternative Softmax Transformation 标题：Sparse-Softmax：一种更简单、更快速的替代Softmax变换链接：https://arxiv.org/abs/2112.12433

作者：Shaoshi Sun,Zhenyuan Zhang,BoCheng Huang,Pengbin Lei,Jianlin Su,Shengfeng Pan,Jiarun Cao 机构：School of Computer Science and Informatics, Cardiff University, the United Kingdom, Department of Economics, Osaka City University, Japan, School of Software Engineering, Beijing Jiaotong University, China 摘要：softmax函数广泛应用于人工神经网络中的多类分类问题，其中softmax变换强制输出为正和为一，相应的损失函数允许使用最大似然原理优化模型。然而，softmax在进行高维分类时，为损失函数进行优化操作留下了很大的余地，这在一定程度上导致了性能低下。在本文中，我们提供了一个简单简洁的softmax变体，即稀疏softmax的实证研究，以缓解传统softmax在高维分类问题方面出现的问题。我们在几个跨学科任务中评估了我们的方法，实验结果表明稀疏softmax比基线模型更简单、更快，并且产生更好的结果。摘要：The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to use maximum likelihood principle to optimize the model. However, softmax leaves a large margin for loss function to conduct optimizing operation when it comes to high-dimensional classification, which results in low-performance to some extent. In this paper, we provide an empirical study on a simple and concise softmax variant, namely sparse-softmax, to alleviate the problem that occurred in traditional softmax in terms of high-dimensional classification problems. We evaluate our approach in several interdisciplinary tasks, the experimental results show that sparse-softmax is simpler, faster, and produces better results than the baseline models.

【6】 Mitigating Leakage from Data Dependent Communications in Decentralized Computing using Differential Privacy 标题：利用差分保密性减轻分散计算中数据相关通信的泄漏链接：https://arxiv.org/abs/2112.12411

作者：Riad Ladjel,Nicolas Anciaux,Aurélien Bellet,Guillaume Scerri 机构：∗Petrus team, Inria, France †Magnet team, Inria, France 摘要：想象一下，一群公民为了共同的利益愿意集体贡献他们的个人数据，以产生对社会有用的信息，这些信息来自数据分析或机器学习计算。与执行计算的集中式服务器共享原始个人数据可能会引起对隐私和大规模监视感知风险的担忧。取而代之的是，公民可以相互信任，并信任他们自己的设备参与分散计算，从而协同生成要共享的聚合数据发布。在安全计算节点在运行时通过安全通道交换消息的情况下，一个关键的安全问题是防止外部攻击者观察流量，因为外部攻击者对数据的依赖可能会泄露个人信息。现有的解决方案是针对云环境设计的，其目标是隐藏底层数据集的所有属性，并且没有解决上述环境中出现的特定隐私和效率挑战。在本文中，我们定义了一个通用的执行模型来控制用户端分散计算中通信的数据依赖性，在该模型中，全局执行计划中通信模式的差异隐私保证可以通过结合在本地节点集群上获得的保证来分析。我们提出了一套算法，允许在隐私、效用和效率之间进行权衡。我们的正式隐私保证通过洗牌来利用和扩展隐私放大的最新结果。我们通过两个具有代表性的例子来说明我们的建议的有用性，这两个例子是具有数据依赖通信的分散执行计划。摘要：Imagine a group of citizens willing to collectively contribute their personal data for the common good to produce socially useful information, resulting from data analytics or machine learning computations. Sharing raw personal data with a centralized server performing the computation could raise concerns about privacy and a perceived risk of mass surveillance. Instead, citizens may trust each other and their own devices to engage into a decentralized computation to collaboratively produce an aggregate data release to be shared. In the context of secure computing nodes exchanging messages over secure channels at runtime, a key security issue is to protect against external attackers observing the traffic, whose dependence on data may reveal personal information. Existing solutions are designed for the cloud setting, with the goal of hiding all properties of the underlying dataset, and do not address the specific privacy and efficiency challenges that arise in the above context. In this paper, we define a general execution model to control the data-dependence of communications in user-side decentralized computations, in which differential privacy guarantees for communication patterns in global execution plans can be analyzed by combining guarantees obtained on local clusters of nodes. We propose a set of algorithms which allow to trade-off between privacy, utility and efficiency. Our formal privacy guarantees leverage and extend recent results on privacy amplification by shuffling. We illustrate the usefulness of our proposal on two representative examples of decentralized execution plans with data-dependent communications.

【7】 Selective Multiple Power Iteration: from Tensor PCA to gradient-based exploration of landscapes 标题：选择性多次方迭代：从张量主成分分析到基于梯度的景观探测链接：https://arxiv.org/abs/2112.12306

作者：Mohamed Ouerfelli,Mohamed Tamaazousti,Vincent Rivasseau 机构：Universit´e Paris-Saclay, CEA, List, F-, Palaiseau, France, Universit´e Paris-Saclay, CNRSIN,P, IJCLab, Orsay, France 摘要：我们提出了选择性多重幂迭代（SMPI），一种解决重要的张量PCA问题的新算法，该算法包括恢复被高斯噪声张量$\bf{R}^n^{\otimes k}$损坏的尖峰$\bf{v{u 0}^{\otimes k}$，其中$\bf{T}=\sqrt{n}\beta\bf{v{u 0}{k}+\bf{Z}$，其中$\beta$是信噪比（SNR）。SMPI包括生成多项式数量的随机初始化，在每次初始化时执行多项式数量的对称张量幂迭代，然后选择一个最大化$\langle\bf{T}、\bf{v}^{\otimes k}\rangle$。在常规考虑的$n\leq 1000$范围内，$k=3$的各种数值模拟表明，SMPI的实验性能在现有算法的基础上大幅提高，并与理论最佳回收率相当。我们表明，这些意想不到的性能是由于一种强大的机制，其中噪声对信号恢复起着关键作用，并且发生在低$\beta$。此外，这种机制源于SMPI的五个基本特性，这五个特性使它区别于以前基于幂迭代的算法。这些显著的结果可能对张量主成分分析的实际应用和理论应用产生重大影响。（i）我们提供了该算法的一个变体来处理低秩CP张量分解。这些算法甚至在实际数据上也优于现有方法，这对实际应用有巨大的潜在影响。（ii）我们对SMPI和梯度下降方法的行为提出了新的理论见解，用于在各种机器学习问题中存在的高维非凸景观中进行优化。（iii）我们期望这些结果可能有助于讨论推测的统计算法差距的存在。摘要：We propose Selective Multiple Power Iterations (SMPI), a new algorithm to address the important Tensor PCA problem that consists in recovering a spike $\bf{v_0}^{\otimes k}$ corrupted by a Gaussian noise tensor $\bf{Z} \in (\mathbb{R}^n)^{\otimes k}$ such that $\bf{T}=\sqrt{n} \beta \bf{v_0}^{\otimes k} + \bf{Z}$ where $\beta$ is the signal-to-noise ratio (SNR). SMPI consists in generating a polynomial number of random initializations, performing a polynomial number of symmetrized tensor power iterations on each initialization, then selecting the one that maximizes $\langle \bf{T}, \bf{v}^{\otimes k} \rangle$. Various numerical simulations for $k=3$ in the conventionally considered range $n \leq 1000$ show that the experimental performances of SMPI improve drastically upon existent algorithms and becomes comparable to the theoretical optimal recovery. We show that these unexpected performances are due to a powerful mechanism in which the noise plays a key role for the signal recovery and that takes place at low $\beta$. Furthermore, this mechanism results from five essential features of SMPI that distinguish it from previous algorithms based on power iteration. These remarkable results may have strong impact on both practical and theoretical applications of Tensor PCA. (i) We provide a variant of this algorithm to tackle low-rank CP tensor decomposition. These proposed algorithms also outperforms existent methods even on real data which shows a huge potential impact for practical applications. (ii) We present new theoretical insights on the behavior of SMPI and gradient descent methods for the optimization in high-dimensional non-convex landscapes that are present in various machine learning problems. (iii) We expect that these results may help the discussion concerning the existence of the conjectured statistical-algorithmic gap.

【8】 A Robust Initialization of Residual Blocks for Effective ResNet Training without Batch Normalization 标题：不需要批量归一化的有效ResNet训练的残差块的鲁棒初始化链接：https://arxiv.org/abs/2112.12299

作者：Enrico Civitelli,Alessio Sortino,Matteo Lapucci,Francesco Bagattini,Giulio Galvan 机构：Dipartimento di Ingegneria dell’Informazione, Università degli Studi di Firenze, Via di S. Marta , Firenze, Italy, Flair Tech 备注：13 pages (2 pages of supplementary material), 8 figures, 1 table 摘要：批量标准化是所有最先进的神经网络体系结构的重要组成部分。然而，由于它引入了许多实际问题，最近的许多研究都致力于设计无规范化的体系结构。在本文中，我们证明了权值初始化是训练类似ResNet的无规范化网络的关键。特别是，我们建议对跳转连接分支的块输出的求和操作进行轻微修改，以便正确初始化整个网络。我们表明，这种改进的体系结构在CIFAR-10上取得了有竞争力的结果，而无需进一步的正则化或算法修改。摘要：Batch Normalization is an essential component of all state-of-the-art neural networks architectures. However, since it introduces many practical issues, much recent research has been devoted to designing normalization-free architectures. In this paper, we show that weights initialization is key to train ResNet-like normalization-free networks. In particular, we propose a slight modification to the summation operation of a block output to the skip connection branch, so that the whole network is correctly initialized. We show that this modified architecture achieves competitive results on CIFAR-10 without further regularization nor algorithmic modifications.

【9】 The Universal \ell^p-Metric on Merge Trees链接：https://arxiv.org/abs/2112.12165

作者：Robert Cardona,Justin Curry,Tung Lam,Michael Lesnick 机构：University at Albany, State University of New York (SUNY) 备注：20 pages 摘要：根据Bjerkevik和Lesnick给出的多参数持久化模块的定义，我们在合并树上引入了交错距离的$\ell^p$型扩展。我们证明了我们的距离是一个度量，并且它是相关条形码之间$p$-Wasserstein距离的上界。对于[1,infty]$中的每一个$p，我们证明该距离相对于细胞亚级过滤是稳定的，并且它是满足该稳定性性质的通用（即最大）距离。在$p=\infty$情况下，这为合并树上的交错距离提供了一个新的普遍性证明。摘要：Adapting a definition given by Bjerkevik and Lesnick for multiparameter persistence modules, we introduce an $\ell^p$-type extension of the interleaving distance on merge trees. We show that our distance is a metric, and that it upper-bounds the $p$-Wasserstein distance between the associated barcodes. For each $p\in[1,\infty]$, we prove that this distance is stable with respect to cellular sublevel filtrations and that it is the universal (i.e., largest) distance satisfying this stability property. In the $p=\infty$ case, this gives a novel proof of universality for the interleaving distance on merge trees.

【10】 Beyond Low Earth Orbit: Biomonitoring, Artificial Intelligence, and Precision Space Health 标题：在近地轨道之外：生物监测、人工智能和精密空间健康链接：https://arxiv.org/abs/2112.12554

作者：Ryan T. Scott,Erik L. Antonsen,Lauren M. Sanders,Jaden J. A. Hastings,Seung-min Park,Graham Mackintosh,Robert J. Reynolds,Adrienne L. Hoarfrost,Aenor Sawyer,Casey S. Greene,Benjamin S. Glicksberg,Corey A. Theriot,Daniel C. Berrios,Jack Miller,Joel Babdor,Richard Barker,Sergio E. Baranzini,Afshin Beheshti,Stuart Chalk,Guillermo M. Delgado-Aparicio,Melissa Haendel,Arif A. Hamid,Philip Heller,Daniel Jamieson,Katelyn J. Jarvis,John Kalantari,Kia Khezeli,Svetlana V. Komarova,Matthieu Komorowski,Prachi Kothiyal,Ashish Mahabal,Uri Manor,Hector Garcia Martin,Christopher E. Mason,Mona Matar,George I. Mias,Jerry G. Myers, Jr.,Charlotte Nelson,Jonathan Oribello,Patricia Parsons-Wingerter,R. K. Prabhu,Amina Ann Qutub,Jon Rask,Amanda Saravia-Butler,Suchi Saria,Nitin Kumar Singh,Frank Soboczenski,Michael Snyder,Karthik Soman,David Van Valen,Kasthuri Venkateswaran,Liz Warren,Liz Worthey,Jason H. Yang,Marinka Zitnik,Sylvain V. Costes 机构：KBR, Space Biosciences Division, NASA Ames Research Center, Moffett Field, CA , USA., Department of Emergency Medicine, Center for Space Medicine, Baylor College of Medicine, Houston 备注：31 pages, 4 figures 摘要：低地球轨道以外的人类空间探索将涉及远距离和持续时间较长的任务。为了有效缓解无数的空间健康危害，数据和空间健康系统的范式转变是实现地球独立而非依赖地球的必要条件。人工智能和机器学习在生物学和健康领域的有希望的发展可以满足这些需求。我们提出了一个适当的自主和智能精确空间健康系统，该系统将监测、汇总和评估生物医学状态；分析和预测个性化不良健康结果；适应和应对新积累的数据；并为每个深空机组成员提供预防性、可操作性和及时的见解，并为其机组医务人员提供迭代决策支持。在这里，我们总结了美国国家航空和航天局组织的一个讲习班提出的关于人工智能在空间生物学和健康方面的未来应用的建议。在未来十年中，生物监测技术、生物标志物科学、航天器硬件、智能软件和流线型数据管理必须成熟，并融合到精密的空间健康系统中，以使人类能够在深空中茁壮成长。摘要：Human space exploration beyond low Earth orbit will involve missions of significant distance and duration. To effectively mitigate myriad space health hazards, paradigm shifts in data and space health systems are necessary to enable Earth-independence, rather than Earth-reliance. Promising developments in the fields of artificial intelligence and machine learning for biology and health can address these needs. We propose an appropriately autonomous and intelligent Precision Space Health system that will monitor, aggregate, and assess biomedical statuses; analyze and predict personalized adverse health outcomes; adapt and respond to newly accumulated data; and provide preventive, actionable, and timely insights to individual deep space crew members and iterative decision support to their crew medical officer. Here we present a summary of recommendations from a workshop organized by the National Aeronautics and Space Administration, on future applications of artificial intelligence in space biology and health. In the next decade, biomonitoring technology, biomarker science, spacecraft hardware, intelligent software, and streamlined data management must mature and be woven together into a Precision Space Health system to enable humanity to thrive in deep space.

【11】 Surrogate Likelihoods for Variational Annealed Importance Sampling 标题：变分退火重要性抽样的替代似然率链接：https://arxiv.org/abs/2112.12194

作者：Martin Jankowiak,Du Phan 机构： One regime that 1Broad Institute, broadinstitute 备注：20 pages 摘要：变分推理是一种强大的近似贝叶斯推理范式，具有许多吸引人的特性，包括支持模型学习和数据子采样。相比之下，像哈密顿蒙特卡罗这样的MCMC方法不具有这些性质，但仍然具有吸引力，因为与参数方法相反，MCMC是渐近无偏的。出于这些原因，研究人员试图结合这两类算法的优点，最近的方法更接近于在实践中实现这一愿景。然而，在这些混合方法中支持数据子采样可能是一个挑战，我们通过引入替代似然来解决这一缺点，该替代似然可以与其他变分参数一起学习。我们在理论上认为，由此产生的算法允许用户在推理保真度和计算成本之间进行直观的权衡。在广泛的实证比较中，我们表明我们的方法在实践中表现良好，并且非常适合概率规划框架中的黑盒推理。摘要：Variational inference is a powerful paradigm for approximate Bayesian inference with a number of appealing properties, including support for model learning and data subsampling. By contrast MCMC methods like Hamiltonian Monte Carlo do not share these properties but remain attractive since, contrary to parametric methods, MCMC is asymptotically unbiased. For these reasons researchers have sought to combine the strengths of both classes of algorithms, with recent approaches coming closer to realizing this vision in practice. However, supporting data subsampling in these hybrid methods can be a challenge, a shortcoming that we address by introducing a surrogate likelihood that can be learned jointly with other variational parameters. We argue theoretically that the resulting algorithm permits the user to make an intuitive trade-off between inference fidelity and computational cost. In an extensive empirical comparison we show that our method performs well in practice and that it is well-suited for black-box inference in probabilistic programming frameworks.

机器翻译，仅供参考

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-12-24，如有侵权请联系 cloudcommunity@tencent.com 删除

linux