机器学习学术速递[9.7]

公众号-arXiv每日学术速递

发布于 2021-09-16 16:31:09

9910

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.LG 方向，今日共计137篇

Graph相关(图学习|图神经网络|图优化等)(8篇)

【1】 Knowledge Graph Enhanced Event Extraction in Financial Documents 标题：知识图增强的金融文档事件抽取链接：https://arxiv.org/abs/2109.02592

作者：Kaihao Guo,Tianpei Jiang,Haipeng Zhang 机构：School of Information Science and Technology, ShanghaiTech University, Shanghai, China 备注：None 摘要：事件提取是自然语言处理中的一项经典任务，广泛用于处理大量但增长迅速的金融、法律、医疗和政府文档，这些文档通常包含多个事件，其元素分散并混合在文档中，使问题变得更加困难。虽然要提取的事件元素之间的潜在关系提供了有用的上下文信息，但它们在以前的研究中却被忽略了。我们展示了通过利用捕获实体关系及其属性的知识图对该任务的增强。我们提出了一个第一事件提取框架，该框架通过一个图神经网络嵌入一个知识图，并将嵌入与规则特征相结合，所有这些都在文档级别。具体来说，对于从中国金融公告中提取事件，我们的方法在F1得分上比最先进的方法高出5.3%。摘要：Event extraction is a classic task in natural language processing with wide use in handling large amount of yet rapidly growing financial, legal, medical, and government documents which often contain multiple events with their elements scattered and mixed across the documents, making the problem much more difficult. Though the underlying relations between event elements to be extracted provide helpful contextual information, they are somehow overlooked in prior studies. We showcase the enhancement to this task brought by utilizing the knowledge graph that captures entity relations and their attributes. We propose a first event extraction framework that embeds a knowledge graph through a Graph Neural Network and integrates the embedding with regular features, all at document-level. Specifically, for extracting events from Chinese financial announcements, our method outperforms the state-of-the-art method by 5.3% in F1-score.

【2】 Pointspectrum: Equivariance Meets Laplacian Filtering for Graph Representation Learning 标题：点谱：等方差满足拉普拉斯滤波的图表示学习链接：https://arxiv.org/abs/2109.02358

作者：Marinos Poiitis,Pavlos Sermpezis,Athena Vakali 机构： Aristotle University of Thessaloniki 备注：13 pages, 8 figures, 6 tables, AAAI 22 submission 摘要：图形表示学习（GRL）已成为现代图形数据挖掘和学习任务的基础。GRL的目标是捕获图形的结构信息，并将其与节点和边属性结合使用，以计算低维表示。虽然图形神经网络（GNN）已用于最先进的GRL体系结构，但当许多GNN层需要堆叠时，它们会受到过度平滑的影响。在另一种GRL方法中，出现了基于图滤波的谱方法来解决过度平滑问题；然而，到目前为止，他们使用传统的神经网络，不能有效地利用图形数据的结构。受此启发，我们提出了点谱，一种谱方法，它结合了一组等变网络来解释图的结构。PointSpectrum提高了光谱方法的效率和表达能力，同时它优于或与最先进的GRL方法竞争。总的来说，PointSpectrum通过使用图过滤器解决过度平滑问题，并通过集等变来捕获图的结构，位于GNN和谱方法的交叉点上。我们的发现对于光谱方法和GRL的这种结构转变的益处和适用性是有希望的。摘要：Graph Representation Learning (GRL) has become essential for modern graph data mining and learning tasks. GRL aims to capture the graph's structural information and exploit it in combination with node and edge attributes to compute low-dimensional representations. While Graph Neural Networks (GNNs) have been used in state-of-the-art GRL architectures, they have been shown to suffer from over smoothing when many GNN layers need to be stacked. In a different GRL approach, spectral methods based on graph filtering have emerged addressing over smoothing; however, up to now, they employ traditional neural networks that cannot efficiently exploit the structure of graph data. Motivated by this, we propose PointSpectrum, a spectral method that incorporates a set equivariant network to account for a graph's structure. PointSpectrum enhances the efficiency and expressiveness of spectral methods, while it outperforms or competes with state-of-the-art GRL methods. Overall, PointSpectrum addresses over smoothing by employing a graph filter and captures a graph's structure through set equivariance, lying on the intersection of GNNs and spectral methods. Our findings are promising for the benefits and applicability of this architectural shift for spectral methods and GRL.

【3】 Quantifying the Reproducibility of Graph Neural Networks using Multigraph Brain Data 标题：利用多图脑数据量化图神经网络的重复性链接：https://arxiv.org/abs/2109.02248

作者：Mohammed Amine Gharsallaoui,Islem Rekik 机构：Disease Neuroimaging Initiative, BASIRA lab, Istanbul Technical, University, Istanbul, Turkey, School of Science and Engineering, Computing, University of Dundee, UK 摘要：图形神经网络（GNNs）在解决计算机视觉、计算机辅助诊断和相关领域的若干问题方面取得了前所未有的发展。虽然先前的研究集中于提高模型的准确性，但量化GNNs识别的最具辨别力特征的再现性仍然是一个完整的问题，特别是在临床应用中，这会引起人们对其可靠性的担忧。具体而言，生物标志物在临床数据集中的再现性以及在不同类别（如健康和紊乱的大脑）之间的分布变化对于揭示疾病的基础机制以及推动个性化治疗的发展至关重要。基于这些问题，我们首次提出了基于再现性的GNN选择（RG Select），这是一个通过量化不同模型之间共享的最具区别性的特征（即生物标志物）来评估GNN再现性的框架。为了确定我们框架的可靠性，再现性评估包括不同因素的变化，如训练策略和数据扰动。尽管存在这些挑战，我们的框架成功地在不同的训练策略和各种临床数据集上得出了可复制的结论。因此，我们的发现可以为计算机辅助诊断和预后任务的生物标志物可信度和可靠性评估方法的发展铺平道路。RG选择代码在GitHub上的https://github.com/basiralab/RG-Select. 摘要：Graph neural networks (GNNs) have witnessed an unprecedented proliferation in tackling several problems in computer vision, computer-aided diagnosis, and related fields. While prior studies have focused on boosting the model accuracy, quantifying the reproducibility of the most discriminative features identified by GNNs is still an intact problem that yields concerns about their reliability in clinical applications in particular. Specifically, the reproducibility of biological markers across clinical datasets and distribution shifts across classes (e.g., healthy and disordered brains) is of paramount importance in revealing the underpinning mechanisms of diseases as well as propelling the development of personalized treatment. Motivated by these issues, we propose, for the first time, reproducibility-based GNN selection (RG-Select), a framework for GNN reproducibility assessment via the quantification of the most discriminative features (i.e., biomarkers) shared between different models. To ascertain the soundness of our framework, the reproducibility assessment embraces variations of different factors such as training strategies and data perturbations. Despite these challenges, our framework successfully yielded replicable conclusions across different training strategies and various clinical datasets. Our findings could thus pave the way for the development of biomarker trustworthiness and reliability assessment methods for computer-aided diagnosis and prognosis tasks. RG-Select code is available on GitHub at https://github.com/basiralab/RG-Select.

【4】 Soft Hierarchical Graph Recurrent Networks for Many-Agent Partially Observable Environments 标题：多智能体部分可观测环境下的软层次图递归网络链接：https://arxiv.org/abs/2109.02032

作者：Zhenhui Ye,Xiaohong Jiang,Guanghua Song,Bowei Yang 机构：School of Aerospace and Astronautics, Zhejiang University, College of Computer Science and Technology, Zhejiang University 备注：9 pages, 6 figures, 1 table. Under review 摘要：多智能体深度强化学习（MADRL）的最新进展使其在实际任务中更具实用性，但其相对较差的可扩展性和部分可观察的约束对其性能和部署提出了挑战。根据我们的直觉观察，人类社会可以被视为一个大规模的部分可观察环境，每个人都有与邻居沟通和记忆自己经验的功能，我们提出了一种新的网络结构，称为层次图递归网络（HGRN），用于部分可观测性下的多智能体协作。具体来说，我们将多智能体系统构建为一个图，使用层次图注意网络（HGAT）实现相邻智能体之间的通信，并利用GRU使智能体能够记录历史信息。为了鼓励探索和提高鲁棒性，我们设计了一种最大熵学习方法来学习可配置目标动作熵的随机策略。基于上述技术，我们提出了一种基于值的MADRL算法，称为Soft HGRN及其actor-critic变体SAC-HRGN。基于三个同质任务和一个异质环境的实验结果不仅表明，与四个基线相比，我们的方法取得了明显的改进，而且还证明了所提出模型的可解释性、可扩展性和可转移性。烧蚀研究证明了每个组件的功能和必要性。摘要：The recent progress in multi-agent deep reinforcement learning(MADRL) makes it more practical in real-world tasks, but its relatively poor scalability and the partially observable constraints raise challenges to its performance and deployment. Based on our intuitive observation that the human society could be regarded as a large-scale partially observable environment, where each individual has the function of communicating with neighbors and remembering its own experience, we propose a novel network structure called hierarchical graph recurrent network(HGRN) for multi-agent cooperation under partial observability. Specifically, we construct the multi-agent system as a graph, use the hierarchical graph attention network(HGAT) to achieve communication between neighboring agents, and exploit GRU to enable agents to record historical information. To encourage exploration and improve robustness, we design a maximum-entropy learning method to learn stochastic policies of a configurable target action entropy. Based on the above technologies, we proposed a value-based MADRL algorithm called Soft-HGRN and its actor-critic variant named SAC-HRGN. Experimental results based on three homogeneous tasks and one heterogeneous environment not only show that our approach achieves clear improvements compared with four baselines, but also demonstrates the interpretability, scalability, and transferability of the proposed model. Ablation studies prove the function and necessity of each component.

【5】 Structural Optimization Makes Graph Classification Simpler and Better 标题：结构优化使图形分类更简单、更好链接：https://arxiv.org/abs/2109.02027

作者：Junran Wu,Jianhao Li,Yicheng Pan,Ke Xu 机构：State Key Lab of Software Development Environment, Beihang University, Beijing, China 摘要：在深层神经网络中，通常可以通过增加先前开发的基本模型的复杂性来获得更好的结果。然而，目前尚不清楚是否有办法通过降低此类模型的复杂性来提高性能。在这里，基于一种优化方法，我们研究了在简化模型学习过程的同时提高图分类性能的可行性。受结构信息评估进展的启发，我们优化了给定数据样本，从图形到编码树。特别地，我们最小化转换编码树的结构熵来解码图的密钥结构。这种转换被称为结构优化。此外，我们提出了一种新的特征组合方案，称为分层报告，用于编码树。在该方案中，通过遵循编码树的层次结构，将特征从叶节点转移到根节点。然后，我们在树核和卷积网络中实现了该方案，以执行图分类。树内核遵循Weisfeiler-Lehman（WL）子树内核中的标签传播，但它具有较低的运行时复杂性$O（n）$。卷积网络是树核在深度学习领域的一种特殊实现，称为编码树学习（ETL）。我们用几个图形分类基准对我们的树核和卷积网络进行了实证验证，并证明我们的方法比其他同类方法具有更好的性能和更低的计算量。摘要：In deep neural networks, better results can often be obtained by increasing the complexity of previously developed basic models. However, it is unclear whether there is a way to boost performance by decreasing the complexity of such models. Here, based on an optimization method, we investigate the feasibility of improving graph classification performance while simplifying the model learning process. Inspired by progress in structural information assessment, we optimize the given data sample from graphs to encoding trees. In particular, we minimize the structural entropy of the transformed encoding tree to decode the key structure underlying a graph. This transformation is denoted as structural optimization. Furthermore, we propose a novel feature combination scheme, termed hierarchical reporting, for encoding trees. In this scheme, features are transferred from leaf nodes to root nodes by following the hierarchical structures of encoding trees. We then present an implementation of the scheme in a tree kernel and a convolutional network to perform graph classification. The tree kernel follows label propagation in the Weisfeiler-Lehman (WL) subtree kernel, but it has a lower runtime complexity $O(n)$. The convolutional network is a special implementation of our tree kernel in the deep learning field and is called Encoding Tree Learning (ETL). We empirically validate our tree kernel and convolutional network with several graph classification benchmarks and demonstrate that our methods achieve better performance and lower computational consumption than competing approaches.

【6】 Training Graph Neural Networks by Graphon Estimation 标题：用图估计训练图神经网络链接：https://arxiv.org/abs/2109.01918

作者：Ziqing Hu,Yihao Fang,Lizhen Lin 机构：∗Department of the Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN, USA 摘要：在这项工作中，我们建议通过对从底层网络数据中获得的图估计进行重采样来训练一个图神经网络。更具体地说，首先获得基础网络的图或链路概率矩阵，从中对新网络进行重采样，并在每一层的训练过程中使用。由于重采样带来的不确定性，它有助于缓解图形神经网络（GNN）模型中众所周知的过度平滑问题。我们的框架是通用的，计算效率高，概念简单。我们的方法的另一个吸引人的特点是，它在训练过程中需要最少的额外调整。大量的数值结果表明，我们的方法是有竞争力的，在许多情况下优于其他过平滑减少GNN训练方法。摘要：In this work, we propose to train a graph neural network via resampling from a graphon estimate obtained from the underlying network data. More specifically, the graphon or the link probability matrix of the underlying network is first obtained from which a new network will be resampled and used during the training process at each layer. Due to the uncertainty induced from the resampling, it helps mitigate the well-known issue of over-smoothing in a graph neural network (GNN) model. Our framework is general, computationally efficient, and conceptually simple. Another appealing feature of our method is that it requires minimal additional tuning during the training process. Extensive numerical results show that our approach is competitive with and in many cases outperform the other over-smoothing reducing GNN training methods.

【7】 Node Feature Kernels Increase Graph Convolutional Network Robustness 标题：节点特征核提高了图卷积网络的鲁棒性链接：https://arxiv.org/abs/2109.01785

作者：Mohamed El Amine Seddik,Changmin Wu,Johannes F. Lutzeyer,Michalis Vazirgiannis 机构：Huawei, Paris, France, Ecole Polytechnique, Palaiseau, France 备注：16 pages, 5 figures 摘要：常用的图卷积网络（GCN）对输入扰动的鲁棒性正成为一个日益重要的话题。本文引入了随机GCN，对其进行了随机矩阵理论分析。该分析表明，如果图形受到足够的扰动，或者在极端情况下是随机的，则GCN无法从节点特征中获益。进一步观察到，通过在图结构的邻接矩阵中添加节点特征核来增强GCNs中的消息传递步骤可以解决此问题。在六个真实数据集上对用于节点分类的GCN进行的实证研究进一步证实了理论发现，并证明图形结构的扰动可能导致GCN的性能明显低于仅在节点特征上运行的多层感知器。在实践中，向受扰动图的消息传递中添加节点特征内核可显著提高GCN的性能，从而使其对图扰动更具鲁棒性。我们的代码可在以下网站公开获取：https://github.com/ChangminWu/RobustGCN. 摘要：The robustness of the much-used Graph Convolutional Networks (GCNs) to perturbations of their input is becoming a topic of increasing importance. In this paper, the random GCN is introduced for which a random matrix theory analysis is possible. This analysis suggests that if the graph is sufficiently perturbed, or in the extreme case random, then the GCN fails to benefit from the node features. It is furthermore observed that enhancing the message passing step in GCNs by adding the node feature kernel to the adjacency matrix of the graph structure solves this problem. An empirical study of a GCN utilised for node classification on six real datasets further confirms the theoretical findings and demonstrates that perturbations of the graph structure can result in GCNs performing significantly worse than Multi-Layer Perceptrons run on the node features alone. In practice, adding a node feature kernel to the message passing of perturbed graphs results in a significant improvement of the GCN's performance, thereby rendering it more robust to graph perturbations. Our code is publicly available at:https://github.com/ChangminWu/RobustGCN.

【8】 Multi-View Spatial-Temporal Graph Convolutional Networks with Domain Generalization for Sleep Stage Classification 标题：基于域泛化的多视点时空图卷积网络睡眠阶段分类链接：https://arxiv.org/abs/2109.01824

作者：Ziyu Jia,Youfang Lin,Jing Wang,Xiaojun Ning,Yuanlai He,Ronghao Zhou,Yuhan Zhou,Li-wei H. Lehman 备注：Accepted by IEEE Transactions on Neural Systems and Rehabilitation Engineering(TNSRE) 摘要：睡眠阶段分类对于睡眠评估和疾病诊断至关重要。尽管之前对睡眠阶段进行分类的尝试已经取得了很高的分类性能，但仍存在一些挑战：1）如何有效利用多通道大脑信号的时变空间和时间特征仍然是一个挑战。以往的研究未能充分利用大脑区域间的空间拓扑信息。2）由于个体生物信号存在许多差异，因此如何克服个体差异，提高深层神经网络的泛化能力是非常重要的。3）大多数深度学习方法忽略了模型对大脑的可解释性。为了解决上述问题，我们提出了一种具有域泛化的多视点时空图卷积网络（MSTGCN）用于睡眠阶段分类。具体地说，我们基于大脑区域的功能连通性和物理距离接近性为MSTGCN构建了两个大脑视图图。MSTGCN由用于提取空间特征的图卷积和用于捕获睡眠阶段之间转换规则的时间卷积组成。此外，注意机制被用于捕捉最相关的时空信息，用于睡眠阶段分类。最后，将领域综合和MSTGCN集成到一个统一的框架中，提取主题不变的睡眠特征。在两个公共数据集上的实验表明，该模型优于最先进的基线。摘要：Sleep stage classification is essential for sleep assessment and disease diagnosis. Although previous attempts to classify sleep stages have achieved high classification performance, several challenges remain open: 1) How to effectively utilize time-varying spatial and temporal features from multi-channel brain signals remains challenging. Prior works have not been able to fully utilize the spatial topological information among brain regions. 2) Due to the many differences found in individual biological signals, how to overcome the differences of subjects and improve the generalization of deep neural networks is important. 3) Most deep learning methods ignore the interpretability of the model to the brain. To address the above challenges, we propose a multi-view spatial-temporal graph convolutional networks (MSTGCN) with domain generalization for sleep stage classification. Specifically, we construct two brain view graphs for MSTGCN based on the functional connectivity and physical distance proximity of the brain regions. The MSTGCN consists of graph convolutions for extracting spatial features and temporal convolutions for capturing the transition rules among sleep stages. In addition, attention mechanism is employed for capturing the most relevant spatial-temporal information for sleep stage classification. Finally, domain generalization and MSTGCN are integrated into a unified framework to extract subject-invariant sleep features. Experiments on two public datasets demonstrate that the proposed model outperforms the state-of-the-art baselines.

Transformer(3篇)

【1】 3D Human Texture Estimation from a Single Image with Transformers 标题：基于Transformer的单幅图像三维人体纹理估计链接：https://arxiv.org/abs/2109.02563

作者：Xiangyu Xu,Chen Change Loy 机构：S-Lab, Nanyang Technological University 备注：None 摘要：我们提出了一种基于变换器的三维人体纹理估计框架。该变换器能够有效地利用输入图像的全局信息，克服了现有方法仅基于卷积神经网络的局限性。此外，我们还提出了一种掩模融合策略，以结合基于RGB和基于纹理流的模型的优点。我们进一步引入零件样式丢失，以帮助重建高保真颜色，而不会引入令人不快的瑕疵。大量实验表明，该方法在定量和定性两方面都能有效地对抗最新的三维人体纹理估计方法。摘要：We propose a Transformer-based framework for 3D human texture estimation from a single image. The proposed Transformer is able to effectively exploit the global information of the input image, overcoming the limitations of existing methods that are solely based on convolutional neural networks. In addition, we also propose a mask-fusion strategy to combine the advantages of the RGB-based and texture-flow-based models. We further introduce a part-style loss to help reconstruct high-fidelity colors without introducing unpleasant artifacts. Extensive experiments demonstrate the effectiveness of the proposed method against state-of-the-art 3D human texture estimation approaches both quantitatively and qualitatively.

【2】 A Transformer-based Model to Detect Phishing URLs 标题：一种基于Transformer的钓鱼URL检测模型链接：https://arxiv.org/abs/2109.02138

作者：Pingfan Xu 摘要：网络钓鱼攻击是最近引起网络安全界极大关注的新兴安全问题之一。现有的钓鱼URL检测方法很多。然而，恶意URL检测仍然是一个研究热点，因为攻击者可以通过改变策略绕过新引入的检测机制。本文将介绍一种基于转换器的恶意URL检测模型，该模型具有显著的准确性，优于现有的检测方法。我们进行了实验，并与现有的六种经典检测模型进行了比较。实验表明，基于Transformer的模型是七种模型中性能最好的，检测准确率达到97.3%。摘要：Phishing attacks are among emerging security issues that recently draws significant attention in the cyber security community. There are numerous existing approaches for phishing URL detection. However, malicious URL detection is still a research hotspot because attackers can bypass newly introduced detection mechanisms by changing their tactics. This paper will introduce a transformer-based malicious URL detection model, which has significant accuracy and outperforms current detection methods. We conduct experiments and compare them with six existing classical detection models. Experiments demonstrate that our transformer-based model is the best performing model from all perspectives among the seven models and achieves 97.3 % of detection accuracy.

【3】 Error Detection in Large-Scale Natural Language Understanding Systems Using Transformer Models 标题：基于Transformer模型的大规模自然语言理解系统中的错误检测链接：https://arxiv.org/abs/2109.01754

作者：Rakesh Chada,Pradeep Natarajan,Darshan Fofadiya,Prathap Ramachandra 机构：Amazon Alexa AI 备注：Accepted to ACL Findings 2021 摘要：像Alexa、Siri、Cortana和Google Assistant这样的大型会话助手使用多个域、意图和命名实体识别模型来处理每一个话语。鉴于模型开发的解耦性和大流量，识别此类系统错误处理的话语极其困难。我们通过使用离线转换器模型来检测域分类错误来解决这个难题。我们将RoBERTa模型中的话语编码与生产系统产生的Nbest假设相结合。然后，我们在多任务设置中使用一个小数据集对带有域分类错误的带人类注释的话语进行端到端的微调。我们测试了我们的方法，用于检测一个域中的错误分类，该域占大规模对话AI系统流量的<0.5%。我们的方法实现了30%的F1分数，比bi-LSTM基线高16.9%，比独立的RoBERTa模型高4.8%。通过整合多个模型，我们将这一点进一步提高了2.2%至32.2%。摘要：Large-scale conversational assistants like Alexa, Siri, Cortana and Google Assistant process every utterance using multiple models for domain, intent and named entity recognition. Given the decoupled nature of model development and large traffic volumes, it is extremely difficult to identify utterances processed erroneously by such systems. We address this challenge to detect domain classification errors using offline Transformer models. We combine utterance encodings from a RoBERTa model with the Nbest hypothesis produced by the production system. We then fine-tune end-to-end in a multitask setting using a small dataset of humanannotated utterances with domain classification errors. We tested our approach for detecting misclassifications from one domain that accounts for <0.5% of the traffic in a large-scale conversational AI system. Our approach achieves an F1 score of 30% outperforming a bi- LSTM baseline by 16.9% and a standalone RoBERTa model by 4.8%. We improve this further by 2.2% to 32.2% by ensembling multiple models.

GAN|对抗|攻击|生成相关(9篇)

【1】 An Efficient Deep Learning Approach Using Improved Generative Adversarial Networks for Incomplete Information Completion of Self-driving 标题：基于改进生成对抗性网络的自动驾驶不完全信息补全深度学习方法链接：https://arxiv.org/abs/2109.02629

作者：Jingzhi Tu,Gang Mei,Francesco Piccialli 机构：Received: date Accepted: date 备注：10 figures, 4 tables 摘要：自动驾驶是工业物联网（IIoT）中智能物流的关键技术。在自动驾驶中，由于光探测和测距（LiDAR）应用时遮挡、传感器分辨率和视角的限制，不可避免地会出现丢失几何和语义信息的不完整点云。不完全点云的出现，特别是不完全车辆点云的出现，将导致自动驾驶车辆在目标检测、交通警报和碰撞避免方面的准确性降低。现有的点云完成网络，如点分形网络（PF-Net），主要关注点云完成的准确性，而没有考虑推理过程的效率，这使得它们难以部署到自主驾驶的车辆点云修复中。为了解决上述问题，本文提出了一种有效的深度学习方法来准确、高效地修复自主驾驶中的不完整车辆点云。在该方法中，提出了一种结合增量采样和一次采样的高效下采样算法，以提高基于生成对抗网络（GAN）的PF网络的推理速度。为了评估该方法的性能，我们使用了一个真实的数据集，并创建了一个自主驾驶场景，其中针对三种自主驾驶情况设置了三个具有5种不同大小的不完整车辆点云。改进后的PF网络可以实现超过19倍的加速比，与原始PF网络相比，其精度几乎相同。实验结果表明，改进的PF网络可以有效地完成自主驾驶中的车辆点云。摘要：Autonomous driving is the key technology of intelligent logistics in Industrial Internet of Things (IIoT). In autonomous driving, the appearance of incomplete point clouds losing geometric and semantic information is inevitable owing to limitations of occlusion, sensor resolution, and viewing angle when the Light Detection And Ranging (LiDAR) is applied. The emergence of incomplete point clouds, especially incomplete vehicle point clouds, would lead to the reduction of the accuracy of autonomous driving vehicles in object detection, traffic alert, and collision avoidance. Existing point cloud completion networks, such as Point Fractal Network (PF-Net), focus on the accuracy of point cloud completion, without considering the efficiency of inference process, which makes it difficult for them to be deployed for vehicle point cloud repair in autonomous driving. To address the above problem, in this paper, we propose an efficient deep learning approach to repair incomplete vehicle point cloud accurately and efficiently in autonomous driving. In the proposed method, an efficient downsampling algorithm combining incremental sampling and one-time sampling is presented to improves the inference speed of the PF-Net based on Generative Adversarial Network (GAN). To evaluate the performance of the proposed method, a real dataset is used, and an autonomous driving scene is created, where three incomplete vehicle point clouds with 5 different sizes are set for three autonomous driving situations. The improved PF-Net can achieve the speedups of over 19x with almost the same accuracy when compared to the original PF-Net. Experimental results demonstrate that the improved PF-Net can be applied to efficiently complete vehicle point clouds in autonomous driving.

【2】 Generation of Synthetic Electronic Health Records Using a Federated GAN 标题：使用联邦GaN生成合成电子健康记录链接：https://arxiv.org/abs/2109.02543

作者：John Weldon,Tomas Ward,Eoin Brophy 机构：Dublin City University, Ireland, Tomás Ward, Insight SFI Centre for Data Analytics, School of Computing and INFANT Research Centre 摘要：敏感医疗数据通常受到严格的使用限制。在本文中，我们在真实世界的电子健康记录（EHR）上训练了一个生成性对抗网络（GAN）。然后通过合成数据生成（SDG）来创建一组“假”患者的数据集，以规避使用限制。这个真实世界的数据是表格、二进制、重症监护病房（ICU）患者诊断数据。整个数据集被分割成单独的数据仓库，以模拟现实世界中不同医院的多个ICU单元可能在各自的组织内具有类似结构的数据集，但无法访问彼此的数据集。我们实施了联邦学习（FL）来在每个组织的本地训练单独的GAN，使用其独特的数据竖井，然后将GAN组合成单个中央GAN，而不会暴露任何竖井数据。然后使用该全局中央GAN生成合成患者数据集。我们通过一组医学专业人员的结构化审查，采用统计方法对这些综合患者进行评估。结果表明，当我们在训练单个中心模型和在单独的数据仓库中训练单个模型，然后再将它们组合成中心模型时，合成EHR的质量没有显著降低。这对于统计评估（单来源的均方根误差（RMSE）为0.0154，而双来源的均方根误差（RMSE）为0.0169）和医疗专业人员的评估（单来源产生的EHR和多来源产生的EHR之间没有质量差异）都是正确的。摘要：Sensitive medical data is often subject to strict usage constraints. In this paper, we trained a generative adversarial network (GAN) on real-world electronic health records (EHR). It was then used to create a data-set of "fake" patients through synthetic data generation (SDG) to circumvent usage constraints. This real-world data was tabular, binary, intensive care unit (ICU) patient diagnosis data. The entire data-set was split into separate data silos to mimic real-world scenarios where multiple ICU units across different hospitals may have similarly structured data-sets within their own organisations but do not have access to each other's data-sets. We implemented federated learning (FL) to train separate GANs locally at each organisation, using their unique data silo and then combining the GANs into a single central GAN, without any siloed data ever being exposed. This global, central GAN was then used to generate the synthetic patients data-set. We performed an evaluation of these synthetic patients with statistical measures and through a structured review by a group of medical professionals. It was shown that there was no significant reduction in the quality of the synthetic EHR when we moved between training a single central model and training on separate data silos with individual models before combining them into a central model. This was true for both the statistical evaluation (Root Mean Square Error (RMSE) of 0.0154 for single-source vs. RMSE of 0.0169 for dual-source federated) and also for the medical professionals' evaluation (no quality difference between EHR generated from a single source and EHR generated from multiple sources).

【3】 Automated Robustness with Adversarial Training as a Post-Processing Step 标题：自动健壮性，将对抗性训练作为后处理步骤链接：https://arxiv.org/abs/2109.02532

作者：Ambrish Rawat,Mathieu Sinn,Beat Buesser 机构：IBM Research Europe, IBM Technology Campus, Damastown Ind. Park, Dublin, D,HN, Ireland 摘要：对抗性训练是一项计算成本很高的任务，因此，寻找以鲁棒性为标准的神经网络结构是一项具有挑战性的任务。作为走向实用自动化的一步，本研究探索了一个简单的后处理步骤在产生健壮的深度学习模型方面的有效性。为了实现这一点，我们采用对抗性训练作为从神经结构搜索算法获得的优化网络结构的后处理步骤。采用特定的策略来调整不同步骤的超参数，从而形成一个全自动的管道，用于生成具有对抗性的健壮深度学习模型。我们通过11个图像分类和9个文本分类任务的大量实验证明了所提出的管道的有效性。摘要：Adversarial training is a computationally expensive task and hence searching for neural network architectures with robustness as the criterion can be challenging. As a step towards practical automation, this work explores the efficacy of a simple post processing step in yielding robust deep learning model. To achieve this, we adopt adversarial training as a post-processing step for optimised network architectures obtained from a neural architecture search algorithm. Specific policies are adopted for tuning the hyperparameters of the different steps, resulting in a fully automated pipeline for generating adversarially robust deep learning models. We evidence the usefulness of the proposed pipeline with extensive experimentation across 11 image classification and 9 text classification tasks.

【4】 Backdoor Attack and Defense for Deep Regression 标题：深度回归的后门攻击与防御链接：https://arxiv.org/abs/2109.02381

作者：Xi Li,George Kesidis,David J. Miller,Vladimir Lucic 机构：∗Pennsylvania State University, †Anomalee Inc., ‡Imperial College 摘要：我们演示了对用于回归的深层神经网络的后门攻击。后门攻击基于训练集数据中毒进行定位，其中错误标记的样本被正确标记的样本包围。我们将演示这种定位对于攻击成功的必要性。我们还研究了使用基于梯度的局部误差最大化发现的后门防御的性能。与显著（插值）误差相关且接近许多训练样本的局部误差最大化器是可疑的。该方法还用于通过利用能够为样本提供实值监督（回归目标）的“oracle”进行主动（深度）学习，首先准确地进行深度回归训练。这种预言，包括使用有限差分或蒙特卡罗近似的偏微分方程或SDE的传统数值解算器，与深度回归相比，计算成本要高得多。摘要：We demonstrate a backdoor attack on a deep neural network used for regression. The backdoor attack is localized based on training-set data poisoning wherein the mislabeled samples are surrounded by correctly labeled ones. We demonstrate how such localization is necessary for attack success. We also study the performance of a backdoor defense using gradient-based discovery of local error maximizers. Local error maximizers which are associated with significant (interpolation) error, and are proximal to many training samples, are suspicious. This method is also used to accurately train for deep regression in the first place by active (deep) learning leveraging an "oracle" capable of providing real-valued supervision (a regression target) for samples. Such oracles, including traditional numerical solvers of PDEs or SDEs using finite difference or Monte Carlo approximations, are far more computationally costly compared to deep regression.

【5】 Gradient Normalization for Generative Adversarial Networks 标题：生成式对抗性网络的梯度归一化链接：https://arxiv.org/abs/2109.02235

作者：Yi-Lun Wu,Hong-Han Shuai,Zhi-Rui Tam,Hong-Yu Chiu 机构：National Yang Ming Chiao Tung University 备注：Published as a conference paper at ICCV 2021 摘要：本文提出了一种称为梯度归一化（GN）的新的归一化方法，以解决由尖锐梯度空间引起的生成性对抗网络（GAN）的训练不稳定性问题。与现有的梯度惩罚和谱归一化等工作不同，该方法只对鉴别器函数施加硬1-Lipschitz约束，从而提高了鉴别器的容量。此外，所提出的梯度归一化方法可以应用于不同的GAN结构，只需稍加修改。在四个数据集上的大量实验表明，梯度归一化训练的GANs在Frechet起始距离和起始分数方面都优于现有方法。摘要：In this paper, we propose a novel normalization method called gradient normalization (GN) to tackle the training instability of Generative Adversarial Networks (GANs) caused by the sharp gradient space. Unlike existing work such as gradient penalty and spectral normalization, the proposed GN only imposes a hard 1-Lipschitz constraint on the discriminator function, which increases the capacity of the discriminator. Moreover, the proposed gradient normalization can be applied to different GAN architectures with little modification. Extensive experiments on four datasets show that GANs trained with gradient normalization outperform existing methods in terms of both Frechet Inception Distance and Inception Score.

【6】 VARGAN: Variance Enforcing Network Enhanced GAN 标题：VARGAN：方差增强网络增强型GAN 链接：https://arxiv.org/abs/2109.02117

作者：Sanaz Mohammadjafari,Mucahit Cevik,Ayse Basar 摘要：生成性对抗网络是应用最广泛的生成模型之一。GANs可以学习复杂的多模态分布，并生成真实的样本。尽管GANs在生成合成数据方面取得了重大成功，但它们可能会遇到不稳定的训练过程和模式崩溃。在本文中，我们介绍了一种新的GAN结构，称为方差强制GAN（VARGAN），它结合了第三个网络来在生成的样本中引入多样性。第三个网络测量所生成样本的多样性，用于惩罚低多样性样本的生成器损失。该网络基于可用的训练数据和有限模态的非期望分布进行训练。在一组合成的和真实的图像数据上，与最新的最先进的模型相比，VARGAN生成的样本更加多样化。高分集性和低计算复杂度，以及快速收敛，使VARGAN成为一种有希望缓解模式崩溃的模型。摘要：Generative adversarial networks (GANs) are one of the most widely used generative models. GANs can learn complex multi-modal distributions, and generate real-like samples. Despite the major success of GANs in generating synthetic data, they might suffer from unstable training process, and mode collapse. In this paper, we introduce a new GAN architecture called variance enforcing GAN (VARGAN), which incorporates a third network to introduce diversity in the generated samples. The third network measures the diversity of the generated samples, which is used to penalize the generator's loss for low diversity samples. The network is trained on the available training data and undesired distributions with limited modality. On a set of synthetic and real-world image data, VARGAN generates a more diverse set of samples compared to the recent state-of-the-art models. High diversity and low computational complexity, as well as fast convergence, make VARGAN a promising model to alleviate mode collapse.

【7】 Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks 标题：基于变分自动编码和循环一致对抗网络的音色传递链接：https://arxiv.org/abs/2109.02096

作者：Russell Sammut Bonnici,Charalampos Saitis,Martin Benning 机构：School of Electronic Engineering and Computer Science, Queen Mary University of London, United Kingdom, E,NS, School of Mathematical Sciences 备注：12 pages, 3 main figures, 4 tables 摘要：本研究项目研究了深度学习在音色转换中的应用，即在音质损失最小的情况下，将源音频的音色转换为目标音频的音色。所采用的方法将变分自动编码器与生成对抗网络相结合，以构建源音频的有意义表示并生成真实的目标音频，并应用于Flickr 8k音频数据集，用于在说话人之间传输音色，应用于URMP数据集，用于传输音频乐器之间的音色。此外，对所采用方法的各种变体进行训练，并使用SSIM（结构相似性指数）和FAD（Frechêet音频距离）指标对一般性能进行比较。研究发现，就重建能力而言，多对多方法取代了一对一方法，采用基本的瓶颈剩余块设计更适合于丰富关于潜在空间的内容信息。还发现，关于循环损耗采用变分自动编码器还是普通自动编码器方法的决定对模型的重建和对抗性翻译方面没有显著影响。摘要：This research project investigates the application of deep learning to timbre transfer, where the timbre of a source audio can be converted to the timbre of a target audio with minimal loss in quality. The adopted approach combines Variational Autoencoders with Generative Adversarial Networks to construct meaningful representations of the source audio and produce realistic generations of the target audio and is applied to the Flickr 8k Audio dataset for transferring the vocal timbre between speakers and the URMP dataset for transferring the musical timbre between instruments. Furthermore, variations of the adopted approach are trained, and generalised performance is compared using the metrics SSIM (Structural Similarity Index) and FAD (Frech\'et Audio Distance). It was found that a many-to-many approach supersedes a one-to-one approach in terms of reconstructive capabilities, and that the adoption of a basic over a bottleneck residual block design is more suitable for enriching content information about a latent space. It was also found that the decision on whether cyclic loss takes on a variational autoencoder or vanilla autoencoder approach does not have a significant impact on reconstructive and adversarial translation aspects of the model.

【8】 Tolerating Adversarial Attacks and Byzantine Faults in Distributed Machine Learning 标题：分布式机器学习中容忍敌意攻击和拜占庭错误链接：https://arxiv.org/abs/2109.02018

作者：Yusen Wu,Hao Chen,Xin Wang,Chao Liu,Phuong Nguyen,Yelena Yesha 机构：University of Maryland, Baltimore County, Baltimore, MD, USA, OpenKneck Inc, Halethorpe, MD, USA, University of Miami, FL, USA 备注：10 pages, 4 figures, conference 摘要：对抗性攻击试图破坏大规模分布式机器学习系统中人工智能和机器学习模型的训练、再训练和利用。这会对其预测结果造成安全风险。例如，攻击者试图通过提供不准确的虚假数据或更改模型参数来毒害模型。此外，分布式系统中出现拜占庭式故障，包括软件、硬件、网络问题，这也会对预测结果产生负面影响。在本文中，我们提出了一种新的分布式训练算法，即部分同步随机梯度下降（ParSGD），它可以防御对抗性攻击和/或容忍拜占庭错误。在训练阶段，我们再次证明了我们的算法在三种常见的对抗性攻击（ML模型和拜占庭错误）下的有效性。我们的结果表明，使用ParSGD，ML模型仍然可以产生准确的预测，就好像它没有受到攻击，或者在几乎一半的节点受到破坏或失败时根本没有发生故障一样。我们将报告ParSGD与其他算法的对比实验评估。摘要：Adversarial attacks attempt to disrupt the training, retraining and utilizing of artificial intelligence and machine learning models in large-scale distributed machine learning systems. This causes security risks on its prediction outcome. For example, attackers attempt to poison the model by either presenting inaccurate misrepresentative data or altering the models' parameters. In addition, Byzantine faults including software, hardware, network issues occur in distributed systems which also lead to a negative impact on the prediction outcome. In this paper, we propose a novel distributed training algorithm, partial synchronous stochastic gradient descent (ParSGD), which defends adversarial attacks and/or tolerates Byzantine faults. We demonstrate the effectiveness of our algorithm under three common adversarial attacks again the ML models and a Byzantine fault during the training phase. Our results show that using ParSGD, ML models can still produce accurate predictions as if it is not being attacked nor having failures at all when almost half of the nodes are being compromised or failed. We will report the experimental evaluations of ParSGD in comparison with other algorithms.

【9】 Training Meta-Surrogate Model for Transferable Adversarial Attack 标题：可转移对抗性攻击的元代理模型训练链接：https://arxiv.org/abs/2109.01983

作者：Yunxiao Qin,Yuanhao Xiong,Jinfeng Yi,Cho-Jui Hsieh 机构：JD Technology, University of California, Los Angeles 备注：15 pages, 7 figures 摘要：当不允许查询时，我们考虑对黑盒模型的敌对攻击。在此设置中，许多方法直接攻击代理模型，并将获得的对抗性示例转移到目标模型。以前的大量工作研究了对代理模型的何种攻击可以产生更多可转移的对抗性示例，但由于代理模型和目标模型之间的不匹配，它们的性能仍然受到限制。在本文中，我们从一个新的角度来解决这个问题——我们是否可以获得一个元代理模型（MSM），从而使对该模型的攻击更容易转移到其他模型，而不是使用原始代理模型？我们证明了这个目标可以用数学公式表示为一个适定（类似于两级）的优化问题，并设计了一个可微攻击者，使训练成为可能。给定一个或一组代理模型，我们的方法可以获得一个MSM，从而使在MSM上生成的对抗性示例具有极大的可转移性。在Cifar-10和ImageNet上的综合实验表明，通过攻击MSM，我们可以获得更强的可转移对抗性示例来愚弄黑盒模型，包括经过对抗训练的模型，其成功率比现有方法高得多。所提出的方法揭示了深部模型的重大安全挑战，有望成为在黑盒环境下评估深部模型鲁棒性的最新基准。摘要：We consider adversarial attacks to a black-box model when no queries are allowed. In this setting, many methods directly attack surrogate models and transfer the obtained adversarial examples to fool the target model. Plenty of previous works investigated what kind of attacks to the surrogate model can generate more transferable adversarial examples, but their performances are still limited due to the mismatches between surrogate models and the target model. In this paper, we tackle this problem from a novel angle -- instead of using the original surrogate models, can we obtain a Meta-Surrogate Model (MSM) such that attacks to this model can be easier transferred to other models? We show that this goal can be mathematically formulated as a well-posed (bi-level-like) optimization problem and design a differentiable attacker to make training feasible. Given one or a set of surrogate models, our method can thus obtain an MSM such that adversarial examples generated on MSM enjoy eximious transferability. Comprehensive experiments on Cifar-10 and ImageNet demonstrate that by attacking the MSM, we can obtain stronger transferable adversarial examples to fool black-box models including adversarially trained ones, with much higher success rates than existing methods. The proposed method reveals significant security challenges of deep models and is promising to be served as a state-of-the-art benchmark for evaluating the robustness of deep models in the black-box setting.

半/弱/无/有监督|不确定性|主动学习(9篇)

【1】 Active Learning for Automated Visual Inspection of Manufactured Products 标题：主动学习在制造产品自动视觉检测中的应用链接：https://arxiv.org/abs/2109.02469

作者：Elena Trajkova,Jože M. Rožanec,Paulien Dam,Blaž Fortuna,Dunja Mladenić 机构：University of Ljubljana, Electrical Engineering, Ljubljana, Slovenia, Jožef Stefan International, Postgraduate School, Philips Consumer Lifestyle BV, Drachten, The Netherlands, Qlector d.o.o., Jožef Stefan Institute 摘要：质量控制是制造企业为确保产品符合质量标准并避免对品牌声誉造成潜在损害而开展的一项关键活动。传感器和连接成本的降低使得制造业的数字化程度不断提高。此外，人工智能实现了更高程度的自动化，减少了缺陷检查所需的总体成本和时间。在本研究中，我们将三种主动学习方法和五种机器学习算法应用于视觉缺陷检测，并与飞利浦消费者生活方式有限公司提供的真实数据进行比较。我们的结果表明，主动学习在不影响模型性能的情况下减少了数据标记工作。摘要：Quality control is a key activity performed by manufacturing enterprises to ensure products meet quality standards and avoid potential damage to the brand's reputation. The decreased cost of sensors and connectivity enabled an increasing digitalization of manufacturing. In addition, artificial intelligence enables higher degrees of automation, reducing overall costs and time required for defect inspection. In this research, we compare three active learning approaches and five machine learning algorithms applied to visual defect inspection with real-world data provided by Philips Consumer Lifestyle BV. Our results show that active learning reduces the data labeling effort without detriment to the models' performance.

【2】 Complementary Calibration: Boosting General Continual Learning with Collaborative Distillation and Self-Supervision 标题：互补性校准：以协同蒸馏和自我监控促进普通持续学习链接：https://arxiv.org/abs/2109.02426

作者：Zhong Ji,Jin Li,Qiang Wang,Zhongfei Zhang 机构： and alsowith the Tianjin Key Laboratory of Brain-Inspired Intelligence Technology, Tianjin University 摘要：一般持续学习（GCL）旨在从非独立且相同分布的流数据中学习，而不会在训练和测试阶段灾难性地忘记不依赖任务边界的旧任务。我们发现，关系和特征偏差是灾难性遗忘的关键问题，其中关系偏差是指所有类别之间的关系在知识提取中的不足，特征偏差是指不区分的特征表示。为此，我们通过挖掘互补模型的输出和特征，提出了一个互补校准（CoCa）框架，以缓解GCL过程中的两个偏差。具体而言，我们提出了一种新的协作蒸馏方法来解决关系偏差问题。它利用新模型输出和保留输出的集合暗知识提取模型输出，既保持了旧任务的性能，又平衡了所有类之间的关系。此外，我们还探索了一种协作的自我监督思想，利用借口任务和监督对比学习，通过学习所有类别的完整特征和区别性特征来解决特征偏差问题。在四个流行数据集上进行的大量实验表明，我们的CoCa框架相对于最先进的方法取得了优异的性能。摘要：General Continual Learning (GCL) aims at learning from non independent and identically distributed stream data without catastrophic forgetting of the old tasks that don't rely on task boundaries during both training and testing stages. We reveal that the relation and feature deviations are crucial problems for catastrophic forgetting, in which relation deviation refers to the deficiency of the relationship among all classes in knowledge distillation, and feature deviation refers to indiscriminative feature representations. To this end, we propose a Complementary Calibration (CoCa) framework by mining the complementary model's outputs and features to alleviate the two deviations in the process of GCL. Specifically, we propose a new collaborative distillation approach for addressing the relation deviation. It distills model's outputs by utilizing ensemble dark knowledge of new model's outputs and reserved outputs, which maintains the performance of old tasks as well as balancing the relationship among all classes. Furthermore, we explore a collaborative self-supervision idea to leverage pretext tasks and supervised contrastive learning for addressing the feature deviation problem by learning complete and discriminative features for all classes. Extensive experiments on four popular datasets show that our CoCa framework achieves superior performance against state-of-the-art methods.

【3】 Supervised DKRC with Images for Offline System Identification 标题：基于图像的有监督DKRC离线系统辨识链接：https://arxiv.org/abs/2109.02241

作者：Alexander Krolicki,Pierre-Yves Lavertu 机构： Clemson University 备注：7 Pages, 9 Figures, 2 Algorithms, will be updated with updated examples once available 摘要：库普曼谱理论近年来在动力系统领域提供了一个新的视角。现代动力系统正变得越来越非线性和复杂，因此需要一个框架来对这些系统进行建模，以便于预测和控制。将库普曼理论应用到感兴趣的系统中的中心问题是，有限维基函数的选择通常是通过使用系统动力学的专家知识事先完成的。我们的方法使用监督学习方法学习这些基函数，其中自动编码器和深度神经网络的组合学习任何给定系统的基函数。我们在一个单摆例子中演示了这种方法，在这个例子中，我们获得了非线性系统的线性表示，然后在给定一些初始条件的情况下预测未来的状态轨迹。我们还探讨了改变动态系统时间序列数据的输入表示如何影响学习基函数的质量。将这种替代表示法与传统的原始时间序列数据方法进行比较，以确定哪种方法能够降低系统真实非线性动力学的重构和预测误差。摘要：Koopman spectral theory has provided a new perspective in the field of dynamical systems in recent years. Modern dynamical systems are becoming increasingly non-linear and complex, and there is a need for a framework to model these systems in a compact and comprehensive representation for prediction and control. The central problem in applying Koopman theory to a system of interest is that the choice of finite-dimensional basis functions is typically done apriori, using expert knowledge of the systems dynamics. Our approach learns these basis functions using a supervised learning approach where a combination of autoencoders and deep neural networks learn the basis functions for any given system. We demonstrate this approach on a simple pendulum example in which we obtain a linear representation of the non-linear system and then predict the future state trajectories given some initial conditions. We also explore how changing the input representation of the dynamic systems time series data can impact the quality of learned basis functions. This alternative representation is compared to the traditional raw time series data approach to determine which method results in lower reconstruction and prediction error of the true non-linear dynamics of the system.

【4】 Providing an Approach to Predicting Customer Quality in E-Commerce Social Networks Based on Big Data and Unsupervised Learning Method 标题：提出一种基于大数据和无监督学习方法的电子商务社交网络客户质量预测方法链接：https://arxiv.org/abs/2109.02080

作者：Mohammad Arab 机构：a Department of Entrepreneurship, University of Tehran, Tehran, Iran, _____________________________________________________________________________________________________ 摘要：每个企业的目标之一就是提高客户忠诚度。顾客忠诚的程度称为顾客质量，其预测将影响战略营销实践。本研究的目的是通过大数据算法和无监督学习来预测大型电子商务社交网络的客户质量。为此，斯坦福网络分析平台（SNAP）使用了一个基于图形的社会网络分析框架进行社区检测。然后在发现的社区中，预测客户的质量。结果表明，影响37.13%的各种访问对客户质量的影响最大，其他参数的影响顺序从高到低依次为：频繁客户访问次数（28.56%）、社交网络角色（28.37%）、间接交易（26.74%），活动日（25.62%）和客户社交网络规模（25.06%）。摘要：One of the goals of every business enterprise is to increase customer loyalty. The degree of customer loyalty is called customer quality which its forecasting will affect strategic marketing practices. The purpose of this study is to predict the quality of customers of large e-commerce social networks by big data algorithms and unsupervised learning. For this purpose, a graph-based social network analysis framework was used for community detection in the Stanford Network Analysis Platform (SNAP). Then in the found communities, the quality of customers was predicted. The results showed that various visits with an impact of 37.13% can have the greatest impact on customer quality and the order of impact of other parameters were from highest to lowest: number of frequent customer visits (28.56%), role in social networks (28.37%), Indirect transactions (26.74%), activity days (25.62%) and customer social network size (25.06%).

【5】 Weakly Supervised Relative Spatial Reasoning for Visual Question Answering 标题：视觉答疑的弱监督相对空间推理链接：https://arxiv.org/abs/2109.01934

作者：Pratyay Banerjee,Tejas Gokhale,Yezhou Yang,Chitta Baral 机构：Arizona State University 备注：Accepted to ICCV 2021. PaperId : ICCV2021-10857 Copyright transferred to IEEE ICCV. DOI will be updated later 摘要：视觉和语言（V\&L）推理需要感知视觉概念，如物体和动作，理解语义和语言基础，并对两种模式之间的相互作用进行推理。视觉推理的一个关键方面是空间理解，它涉及到理解物体的相对位置，即隐含地学习场景的几何结构。在这项工作中，我们评估了V\&L模型对此类几何理解的忠实性，将对象成对相对位置的预测作为分类和回归任务。我们的发现表明，最先进的基于Transformer的V\&L模型缺乏足够的能力来胜任这项任务。基于此，我们设计了两个目标作为3D空间推理（SR）的代理——对象质心估计和相对位置估计，并在现有深度估计器的弱监督下训练V\&L。这将大大提高“GQA”视觉问答挑战的准确性（在完全监督、Few-Shot和O.O.D设置中）以及相对空间推理的改进。将发布代码和数据\href{https://github.com/pratyay-banerjee/weak_sup_vqa}{这里}。摘要：Vision-and-language (V\&L) reasoning necessitates perception of visual concepts such as objects and actions, understanding semantics and language grounding, and reasoning about the interplay between the two modalities. One crucial aspect of visual reasoning is spatial understanding, which involves understanding relative locations of objects, i.e.\ implicitly learning the geometry of the scene. In this work, we evaluate the faithfulness of V\&L models to such geometric understanding, by formulating the prediction of pair-wise relative locations of objects as a classification as well as a regression task. Our findings suggest that state-of-the-art transformer-based V\&L models lack sufficient abilities to excel at this task. Motivated by this, we design two objectives as proxies for 3D spatial reasoning (SR) -- object centroid estimation, and relative position estimation, and train V\&L with weak supervision from off-the-shelf depth estimators. This leads to considerable improvements in accuracy for the "GQA" visual question answering challenge (in fully supervised, few-shot, and O.O.D settings) as well as improvements in relative spatial reasoning. Code and data will be released \href{https://github.com/pratyay-banerjee/weak_sup_vqa}{here}.

【6】 Weakly Supervised Few-Shot Segmentation Via Meta-Learning 标题：基于元学习的弱监督Few-Shot分割链接：https://arxiv.org/abs/2109.01693

作者：Pedro H. T. Gama,Hugo Oliveira,José Marcato Junior,Jefersson A. dos Santos 机构：brHugo Oliveira is with the Institute of Mathematics and Statistics (IME), atUniversity of São Paulo (USP) 摘要：语义分割是一项经典的计算机视觉任务，具有多种应用，包括医学和遥感图像分析。尽管最近基于深度的方法取得了进展，但为训练模型标记样本（像素）是一项艰巨的工作，在某些情况下是不可行的。在本文中，我们提出了两种新的元学习方法，称为WeaSeL和ProtoSeg，用于具有稀疏注释的Few-Shot语义分割任务。我们在医学成像和农业遥感的不同应用（12个数据集）中对所提出的方法进行了广泛的评估，这是一个非常独特的知识领域，通常数据匮乏。结果证明了我们的方法的潜力，与全密度注释相比，该方法在分割咖啡/柑橘作物和人体解剖部位时获得了合适的结果。摘要：Semantic segmentation is a classic computer vision task with multiple applications, which includes medical and remote sensing image analysis. Despite recent advances with deep-based approaches, labeling samples (pixels) for training models is laborious and, in some cases, unfeasible. In this paper, we present two novel meta learning methods, named WeaSeL and ProtoSeg, for the few-shot semantic segmentation task with sparse annotations. We conducted extensive evaluation of the proposed methods in different applications (12 datasets) in medical imaging and agricultural remote sensing, which are very distinct fields of knowledge and usually subject to data scarcity. The results demonstrated the potential of our method, achieving suitable results for segmenting both coffee/orange crops and anatomical parts of the human body in comparison with full dense annotation.

【7】 ALLWAS: Active Learning on Language models in WASserstein space 标题：ALLWAS：Wasserstein空间中语言模型的主动学习链接：https://arxiv.org/abs/2109.01691

作者：Anson Bastos,Manohar Kaul 机构：IIT, Hyderabad, India 摘要：在缺乏标记训练数据的领域，如医学领域，主动学习已成为一种标准范例。由于语言模型提供的性能提升，语言模型已经成为几种自然语言任务的普遍选择。然而，在一些领域，如医学领域，缺少标记的训练数据是一个常见的问题。此外，这些模型在阶级不平衡普遍存在的情况下可能无法很好地工作。在这些情况下，主动学习可能有助于在有限的标签预算下提高性能。为此，我们提出了一种在语言模型中使用基于子模块优化和最优传输的采样技术进行主动学习的新方法，称为ALLWAS。我们构造了一种基于梯度域中设计目标子模优化的抽样策略。此外，为了能够从少量样本中进行学习，我们提出了一种新的从瓦瑟斯坦重心采样的策略。我们对文本分类标准基准数据集的实证评估表明，我们的方法比现有的语言模型主动学习方法表现得更好（在某些情况下相对增长超过20%）。摘要：Active learning has emerged as a standard paradigm in areas with scarcity of labeled training data, such as in the medical domain. Language models have emerged as the prevalent choice of several natural language tasks due to the performance boost offered by these models. However, in several domains, such as medicine, the scarcity of labeled training data is a common issue. Also, these models may not work well in cases where class imbalance is prevalent. Active learning may prove helpful in these cases to boost the performance with a limited label budget. To this end, we propose a novel method using sampling techniques based on submodular optimization and optimal transport for active learning in language models, dubbed ALLWAS. We construct a sampling strategy based on submodular optimization of the designed objective in the gradient domain. Furthermore, to enable learning from few samples, we propose a novel strategy for sampling from the Wasserstein barycenters. Our empirical evaluations on standard benchmark datasets for text classification show that our methods perform significantly better (>20% relative increase in some cases) than existing approaches for active learning on language models.

【8】 Learning Neural Causal Models with Active Interventions 标题：用主动干预学习神经因果模型链接：https://arxiv.org/abs/2109.02429

作者：Nino Scherrer,Olexa Bilaniuk,Yashas Annadani,Anirudh Goyal,Patrick Schwab,Bernhard Schölkopf,Michael C. Mozer,Yoshua Bengio,Stefan Bauer,Nan Rosemary Ke 机构： ETH Zurich, Mila, Universit´e de Montr´eal, GlaxoSmithKline, Max Planck Institute for Intelligent Systems, Google Research, Brain Team, CIFAR Azrieli Global Scholar, KTH Stockholm, DeepMind 摘要：从数据中发现因果结构是一个具有挑战性的推理问题，在所有科学领域都具有根本重要性。神经网络吸引人的标度特性最近引起了人们对基于可微神经网络的方法的兴趣，该方法用于从数据中学习因果结构。到目前为止，可区分因果发现主要集中在观测或干预起源的静态数据集上。在这项工作中，我们引入了一种主动干预目标机制，该机制能够快速识别数据生成过程的潜在因果结构。与随机干预目标相比，我们的方法显著减少了所需的交互次数，并且适用于从数据中学习基础有向无环图（DAG）的离散和连续优化公式。我们在广泛的设置范围内检验了所提出的方法，并在从模拟到真实数据的多个基准上展示了优越的性能。摘要：Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing scaling properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to real-world data.

【9】 Weakly supervised semantic segmentation of tomographic images in the diagnosis of stroke 标题：中风诊断中断层图像的弱监督语义分割链接：https://arxiv.org/abs/2109.01887

作者：Anna Dobshik,Andrey Tulupov,Vladimir Berikov 机构： 3 1 Novosibirsk State University, Russia 3 Sobolev Institute of mathematics SB RAS 备注：6 pages, 3 figures, 1 table 摘要：本文提出了一种在非对比度CT脑图像上自动分割急性中风影响区域的算法。该算法设计用于在弱监督情况下，当一些图像被精确标记，而一些图像被错误标记时的学习。由于放射科医生在人工注释计算机断层扫描图像的过程中所做的不准确，会出现错误的标签。我们提出了在训练数据标记不准确的情况下解决分割问题的方法。我们使用了U-Net神经网络结构，并进行了一些修改。实际CT扫描实验表明，该方法提高了分割精度。摘要：This paper presents an automatic algorithm for the segmentation of areas affected by an acute stroke on the non-contrast computed tomography brain images. The proposed algorithm is designed for learning in a weakly supervised scenario when some images are labeled accurately, and some images are labeled inaccurately. Wrong labels appear as a result of inaccuracy made by a radiologist in the process of manual annotation of computed tomography images. We propose methods for solving the segmentation problem in the case of inaccurately labeled training data. We use the U-Net neural network architecture with several modifications. Experiments on real computed tomography scans show that the proposed methods increase the segmentation accuracy.

迁移|Zero/Few/One-Shot|自适应(5篇)

【1】 GPT-3 Models are Poor Few-Shot Learners in the Biomedical Domain 标题：GPT-3模型在生物医学领域是差劲的学习者链接：https://arxiv.org/abs/2109.02555

作者：Milad Moradi,Kathrin Blagec,Florian Haberl,Matthias Samwald 机构：Institute for Artificial Intelligence, Medical University of Vienna, Austria 摘要：深层神经语言模型在自然语言处理（NLP）的许多任务中取得了新的突破。最近的研究表明，深度转换语言模型（对大量文本进行预训练）可以实现与最先进的模型相比的高水平的任务特定的Few-Shot性能。然而，这些大型语言模型在少数镜头迁移学习中的能力尚未在生物医学领域得到探索。我们研究了两种功能强大的transformer语言模型，即GPT-3和BioBERT，在各种生物医学NLP任务的少数镜头设置下的性能。实验结果表明，在很大程度上，这两种模型的性能都不如在完整训练数据上进行微调的语言模型。虽然GPT-3在开放领域NLP任务的少量知识转移方面已经取得了接近最先进的成果，但它不能像比GPT-3小几个数量级的BioBERT那样有效地执行。鉴于BioBERT已经在大型生物医学文本语料库上预训练过，我们的研究表明，语言模型可能在很大程度上受益于任务特定的Few-Shot学习中的领域内预训练。然而，在这一领域，预先训练似乎是不够的；生物医学NLP领域需要新颖的预训练和少量的镜头学习策略。摘要：Deep neural language models have set new breakthroughs in many tasks of Natural Language Processing (NLP). Recent work has shown that deep transformer language models (pretrained on large amounts of texts) can achieve high levels of task-specific few-shot performance comparable to state-of-the-art models. However, the ability of these large language models in few-shot transfer learning has not yet been explored in the biomedical domain. We investigated the performance of two powerful transformer language models, i.e. GPT-3 and BioBERT, in few-shot settings on various biomedical NLP tasks. The experimental results showed that, to a great extent, both the models underperform a language model fine-tuned on the full training data. Although GPT-3 had already achieved near state-of-the-art results in few-shot knowledge transfer on open-domain NLP tasks, it could not perform as effectively as BioBERT, which is orders of magnitude smaller than GPT-3. Regarding that BioBERT was already pretrained on large biomedical text corpora, our study suggests that language models may largely benefit from in-domain pretraining in task-specific few-shot learning. However, in-domain pretraining seems not to be sufficient; novel pretraining and few-shot learning strategies are required in the biomedical NLP domain.

【2】 Automatic Online Multi-Source Domain Adaptation 标题：自动在线多源域适配链接：https://arxiv.org/abs/2109.01996

作者：Renchunzi Xie,Mahardhika Pratama 机构：School of Computer Science and Engineering, Nanyang Technological University, Singapore 备注：under consideration for publication in Elsevier journal 摘要：跨多个流处理的知识转移仍然是一个具有挑战性的问题，这不仅是因为每个流的分布不同，而且还因为数据流的环境瞬息万变且永无止境。尽管该领域的研究成果不断增长，但大多数现有工作都是针对单一源域开发的，这限制了其利用多源域的弹性，从而有利于快速从概念漂移中恢复，并避免负迁移问题。提出了一种多源流处理下的在线域适配技术，即自动在线多源域适配（AOMSDA）。AOMSDA的在线域自适应策略是在一种耦合的生成式和判别式去噪自动编码器（DAE）方法下制定的，其中集成了基于中心矩差（CMD）的正则化器来处理多源域的存在，从而利用互补信息源。通过自组织结构和节点重加权策略，解决了在不同时间段发生的异步概念漂移问题。我们的数值研究表明，AOMSDA能够在8个研究案例中的5个案例中超越其对手，而消融研究描述了每个学习组件的优势。此外，AOMSDA适用于任意数量的源流。AOMSDA的源代码在中公开共享https://github.com/Renchunzi-Xie/AOMSDA.git. 摘要：Knowledge transfer across several streaming processes remain challenging problem not only because of different distributions of each stream but also because of rapidly changing and never-ending environments of data streams. Albeit growing research achievements in this area, most of existing works are developed for a single source domain which limits its resilience to exploit multi-source domains being beneficial to recover from concept drifts quickly and to avoid the negative transfer problem. An online domain adaptation technique under multisource streaming processes, namely automatic online multi-source domain adaptation (AOMSDA), is proposed in this paper. The online domain adaptation strategy of AOMSDA is formulated under a coupled generative and discriminative approach of denoising autoencoder (DAE) where the central moment discrepancy (CMD)-based regularizer is integrated to handle the existence of multi-source domains thereby taking advantage of complementary information sources. The asynchronous concept drifts taking place at different time periods are addressed by a self-organizing structure and a node re-weighting strategy. Our numerical study demonstrates that AOMSDA is capable of outperforming its counterparts in 5 of 8 study cases while the ablation study depicts the advantage of each learning component. In addition, AOMSDA is general for any number of source streams. The source code of AOMSDA is shared publicly in https://github.com/Renchunzi-Xie/AOMSDA.git.

【3】 FewshotQA: A simple framework for few-shot learning of question answering tasks using pre-trained text-to-text models 标题：FewshotQA：一个使用预先训练的文本到文本模型的问答任务的简单学习框架链接：https://arxiv.org/abs/2109.01951

作者：Rakesh Chada,Pradeep Natarajan 机构：Amazon Alexa AI 备注：Accepted to EMNLP 2021 Main Conference 摘要：仅从几个例子（称为几个镜头设置）中学习的任务对于现实世界的设置具有关键的重要性和相关性。对于问答（QA），当前最先进的预训练模型通常需要对成千上万的示例进行微调，以获得良好的结果。在几个镜头设置中（小于100个示例），它们的性能会显著下降。为了解决这个问题，我们提出了一个简单的微调框架，该框架利用预先训练的文本到文本模型，并直接与预先训练的框架保持一致。具体来说，我们将输入构造为问题、表示答案范围的掩码标记和上下文的串联。给定此输入，使用与其训练前目标相同的目标对模型进行微调。通过对各种少量放炮配置的实验研究，我们表明，该公式在多个QA基准上显著提高（当只有16个训练示例时，平均绝对提高34.2 F1点）。当与更大的车型一起使用时（例如：在使用BART large（仅32个示例）的车队中使用-72.3 F1），收益会进一步扩大，并且可以很好地转换为多语言设置。在多语言TydiQA基准上，我们的模型比XLM Roberta大，绝对优势高达40个F1积分，在几次投篮设置中平均33个F1积分（<=64个训练示例）。我们进行了详细的消融研究，以分析导致这些收益的因素。摘要：The task of learning from only a few examples (called a few-shot setting) is of key importance and relevance to a real-world setting. For question answering (QA), the current state-of-the-art pre-trained models typically need fine-tuning on tens of thousands of examples to obtain good results. Their performance degrades significantly in a few-shot setting (< 100 examples). To address this, we propose a simple fine-tuning framework that leverages pre-trained text-to-text models and is directly aligned with their pre-training framework. Specifically, we construct the input as a concatenation of the question, a mask token representing the answer span and a context. Given this input, the model is fine-tuned using the same objective as that of its pre-training objective. Through experimental studies on various few-shot configurations, we show that this formulation leads to significant gains on multiple QA benchmarks (an absolute gain of 34.2 F1 points on average when there are only 16 training examples). The gains extend further when used with larger models (Eg:- 72.3 F1 on SQuAD using BART-large with only 32 examples) and translate well to a multilingual setting . On the multilingual TydiQA benchmark, our model outperforms the XLM-Roberta-large by an absolute margin of upto 40 F1 points and an average of 33 F1 points in a few-shot setting (<= 64 training examples). We conduct detailed ablation studies to analyze factors contributing to these gains.

【4】 Robust fine-tuning of zero-shot models 标题：零激发模型的鲁棒微调链接：https://arxiv.org/abs/2109.01903

作者：Mitchell Wortsman,Gabriel Ilharco,Mike Li,Jong Wook Kim,Hannaneh Hajishirzi,Ali Farhadi,Hongseok Namkoong,Ludwig Schmidt 机构：†University of Washington ‡Columbia University §OpenAI ◦Allen Institute for Artificial Intelligence △Toyota Research 1arXiv 摘要：当执行零炮推断时，大型预训练模型（如CLIP）在一系列数据分布中提供一致的准确性（即，无需对特定数据集进行微调）。尽管现有的微调方法大大提高了分布的准确性，但它们也降低了分布外的鲁棒性。我们通过引入一种简单有效的提高鲁棒性的方法来解决这种紧张关系：对零炮和微调模型的权重进行置乱。与标准微调相比，由此产生的权重空间组合在分布外提供了较大的精度改进，同时匹配或提高了分布精度。在ImageNet和五个衍生分布偏移上，权重空间集合将分布外精度提高了2到10个百分点，而与标准微调相比，分布精度提高了近1个百分点。在微调或推断过程中，这些改进不会产生额外的计算成本。摘要：Large pre-trained models such as CLIP offer consistent accuracy across a range of data distributions when performing zero-shot inference (i.e., without fine-tuning on a specific dataset). Although existing fine-tuning approaches substantially improve accuracy in-distribution, they also reduce out-of-distribution robustness. We address this tension by introducing a simple and effective method for improving robustness: ensembling the weights of the zero-shot and fine-tuned models. Compared to standard fine-tuning, the resulting weight-space ensembles provide large accuracy improvements out-of-distribution, while matching or improving in-distribution accuracy. On ImageNet and five derived distribution shifts, weight-space ensembles improve out-of-distribution accuracy by 2 to 10 percentage points while increasing in-distribution accuracy by nearly 1 percentage point relative to standard fine-tuning. These improvements come at no additional computational cost during fine-tuning or inference.

【5】 Robust Importance Sampling for Error Estimation in the Context of Optimal Bayesian Transfer Learning 标题：最优贝叶斯转移学习背景下误差估计的鲁棒重要性抽样链接：https://arxiv.org/abs/2109.02150

作者：Omar Maddouri,Xiaoning Qian,Francis J. Alexander,Edward R. Dougherty,Byung-Jun Yoon 摘要：分类一直是建筑智能系统的一项主要任务，因为它能够在不确定性条件下进行决策。分类器设计的目的是从训练数据中建立模型，以显式或隐式地表示特征标签分布。在许多科学或临床环境中，训练数据通常是有限的，这使得设计准确的分类器和评估其分类错误极具挑战性。虽然转移学习（transfer learning，TL）可以通过合并来自相关源领域的数据来改善不同目标领域的学习来缓解这一问题，但它在绩效评估方面很少受到关注，尤其是在错误估计方面。在本文中，我们通过在贝叶斯范式下研究分类误差估计背景下的知识转移来填补这一空白。我们引入了一类新的贝叶斯最小均方误差（MMSE）估计器用于最优贝叶斯转移学习（OBTL），该估计器能够在小样本环境下严格评估不确定性条件下的分类误差。使用蒙特卡罗重要性抽样，我们使用所提出的估计器来评估广泛的分类器家族的分类精度，这些分类器跨越了不同的学习能力。基于合成数据和真实世界RNA测序（RNA-seq）数据的实验结果表明，我们提出的OBTL误差估计方案通过利用其他相关领域的数据明显优于标准误差估计方法，尤其是在小样本环境下。摘要：Classification has been a major task for building intelligent systems as it enables decision-making under uncertainty. Classifier design aims at building models from training data for representing feature-label distributions--either explicitly or implicitly. In many scientific or clinical settings, training data are typically limited, which makes designing accurate classifiers and evaluating their classification error extremely challenging. While transfer learning (TL) can alleviate this issue by incorporating data from relevant source domains to improve learning in a different target domain, it has received little attention for performance assessment, notably in error estimation. In this paper, we fill this gap by investigating knowledge transferability in the context of classification error estimation within a Bayesian paradigm. We introduce a novel class of Bayesian minimum mean-square error (MMSE) estimators for optimal Bayesian transfer learning (OBTL), which enables rigorous evaluation of classification error under uncertainty in a small-sample setting. Using Monte Carlo importance sampling, we employ the proposed estimator to evaluate the classification accuracy of a broad family of classifiers that span diverse learning capabilities. Experimental results based on both synthetic data as well as real-world RNA sequencing (RNA-seq) data show that our proposed OBTL error estimation scheme clearly outperforms standard error estimators, especially in a small-sample setting, by tapping into the data from other relevant domains.

强化学习(6篇)

【1】 Guiding Global Placement With Reinforcement Learning 标题：用强化学习指导全局安置链接：https://arxiv.org/abs/2109.02631

作者：Robert Kirby,Kolby Nottingham,Rajarshi Roy,Saad Godil,Bryan Catanzaro 机构：Nvidia, Santa Clara, CA, USA, Univeristy of California Irvine, Irvine, CA, USA 摘要：GPU加速全局和细节布局的最新进展将解决问题的时间缩短了一个数量级。这一进步使我们能够利用数据驱动的优化（如强化学习），努力提高最终的放置结果质量。在这项工作中，我们使用强化学习代理增强了最先进的、基于力的全局布局解算器，该代理经过训练以改进最终细节放置的半周长导线长度（HPWL）。我们提出了一种新的控制方案，对布局过程进行全局或局部控制。然后，我们训练强化学习代理使用这些控件来引导放置到改进的解决方案。在这两种情况下，增强优化器都能找到改进的布局解决方案。我们训练有素的代理在一系列学术基准上的最终细节位置HPWL平均提高1%，在真实行业设计上的全球位置HPWL平均提高1%以上。摘要：Recent advances in GPU accelerated global and detail placement have reduced the time to solution by an order of magnitude. This advancement allows us to leverage data driven optimization (such as Reinforcement Learning) in an effort to improve the final quality of placement results. In this work we augment state-of-the-art, force-based global placement solvers with a reinforcement learning agent trained to improve the final detail placed Half Perimeter Wire Length (HPWL). We propose novel control schemes with either global or localized control of the placement process. We then train reinforcement learning agents to use these controls to guide placement to improved solutions. In both cases, the augmented optimizer finds improved placement solutions. Our trained agents achieve an average 1% improvement in final detail place HPWL across a range of academic benchmarks and more than 1% in global place HPWL on real industry designs.

【2】 Delving into Macro Placement with Reinforcement Learning 标题：基于强化学习的宏观布局研究链接：https://arxiv.org/abs/2109.02587

作者：Zixuan Jiang,Ebrahim Songhori,Shen Wang,Anna Goldie,Azalia Mirhoseini,Joe Jiang,Young-Joon Lee,David Z. Pan 机构：The University of Texas at Austin, Austin, TX US, Google, Mountain View, CA US 备注：Accepted at 3rd ACM/IEEE Workshop on Machine Learning for CAD (MLCAD) 摘要：在物理设计中，人类设计师通常通过试错来放置宏，这是一个马尔可夫决策过程。强化学习（RL）方法在宏观布局上表现出超人的性能。在本文中，我们建议对之前的工作进行扩展（Mirhoseini等人，2020年）。我们首先描述了政策和价值网络体系结构的细节。我们用DREAMPlace替换了力导向方法，用于在RL环境中放置标准单元。我们还将改进后的方法与公共基准上的其他学术排名进行了比较。摘要：In physical design, human designers typically place macros via trial and error, which is a Markov decision process. Reinforcement learning (RL) methods have demonstrated superhuman performance on the macro placement. In this paper, we propose an extension to this prior work (Mirhoseini et al., 2020). We first describe the details of the policy and value network architecture. We replace the force-directed method with DREAMPlace for placing standard cells in the RL environment. We also compare our improved method with other academic placers on public benchmarks.

【3】 Hindsight Reward Tweaking via Conditional Deep Reinforcement Learning 标题：基于条件深度强化学习的后见之明奖励调整链接：https://arxiv.org/abs/2109.02332

作者：Ning Wei,Jiahua Liang,Di Xie,Shiliang Pu 机构： Pu is with Hikvision ResearchInstitute 摘要：在强化学习（RL）中，设计最优的奖励函数一直是人们所期望的，但却极为困难。当涉及到现代复杂任务时，复杂的奖励函数被广泛用于简化策略学习，但由于训练成本急剧增加，即使对其进行微小调整，评估成本也很高。为此，我们提出了一种事后奖励调整方法，通过设计一种新的深度强化学习范式，在接近最优的空间内模拟奖励函数的影响。我们简单地用一个与有效环境奖励参数线性相关的条件向量扩展输入观测，并以常规方式训练模型，除了随机奖励配置外，获得一个超策略，其特征在条件空间上被敏感地调节。我们证明了该方法的可行性，并研究了其在多任务政策绩效提升中的一个潜在应用。摘要：Designing optimal reward functions has been desired but extremely difficult in reinforcement learning (RL). When it comes to modern complex tasks, sophisticated reward functions are widely used to simplify policy learning yet even a tiny adjustment on them is expensive to evaluate due to the drastically increasing cost of training. To this end, we propose a hindsight reward tweaking approach by designing a novel paradigm for deep reinforcement learning to model the influences of reward functions within a near-optimal space. We simply extend the input observation with a condition vector linearly correlated with the effective environment reward parameters and train the model in a conventional manner except for randomizing reward configurations, obtaining a hyper-policy whose characteristics are sensitively regulated over the condition space. We demonstrate the feasibility of this approach and study one of its potential application in policy performance boosting with multiple MuJoCo tasks.

【4】 Temporal Aware Deep Reinforcement Learning 标题：时序感知深度强化学习链接：https://arxiv.org/abs/2109.02145

作者：Deepak-George Thomas 机构：Iowa State University 摘要：传统的基于图像的深度强化学习（DRL）算法所使用的函数逼近器通常缺乏时间学习成分，而侧重于学习空间成分。我们提出了一种同时学习时间和空间成分的技术。我们的测试是用一个通用的DQN测试的，它在最大回报和样本复杂度方面都优于它。该算法在机器人学和顺序决策领域都有应用。摘要：The function approximators employed by traditional image based Deep Reinforcement Learning (DRL) algorithms usually lack a temporal learning component and instead focus on learning the spatial component. We propose a technique wherein both temporal as well as spatial components are jointly learned. Our tested was tested with a generic DQN and it outperformed it in terms of maximum rewards as well as sample complexity. This algorithm has implications in the robotics as well as sequential decision making domains.

【5】 Eden: A Unified Environment Framework for Booming Reinforcement Learning Algorithms 标题：EDEN：一个蓬勃发展的强化学习算法的统一环境框架链接：https://arxiv.org/abs/2109.01768

作者：Ruizhi Chen,Xiaoyu Wu,Yansong Pan,Kaizhao Yuan,Ling Li,TianYun Ma,JiYuan Liang,Rui Zhang,Kai Wang,Chen Zhang,Shaohui Peng,Xishan Zhang,Zidong Du,Qi Guo,Yunji Chen 机构： Institute of Software Chinese Academy of Sciences, Institute of Comuting and Technology Chinese Academy of Sciences, University of Science and Technology of China 备注：19 pages,16 figures 摘要：随着AlphaGo击败顶尖的人类玩家，强化学习（RL）算法逐渐成为构建更强人工智能（AI）的代码库。RL算法的设计首先需要适应特定的环境，因此设计的环境引导着RL算法的快速深入发展。然而，现有的环境，可以分为现实世界的游戏和定制的玩具环境，有明显的缺点。对于真实世界的游戏，它是为人类娱乐而设计的，对于大多数RL研究人员来说太难了。对于定制的玩具环境，对于所有的RL算法没有广泛接受的统一评估标准。因此，我们为RL介绍了第一个虚拟用户友好环境框架。在这个框架中，环境可以很容易地配置来实现主流研究中的各种RL任务。然后可以方便地评估和比较所有主流的最新（SOTA）RL算法。因此，我们的贡献主要包括以下几个方面：1.为SOTA RL算法的所有分类提供单一配置环境；2.多个分类RL算法的组合环境；3.各种RL算法的评价标准。通过这些努力，培育出一种在各种任务中具有综合能力的人工智能成为可能，也许它将为人工智能打开一个新的篇章。摘要：With AlphaGo defeats top human players, reinforcement learning(RL) algorithms have gradually become the code-base of building stronger artificial intelligence(AI). The RL algorithm design firstly needs to adapt to the specific environment, so the designed environment guides the rapid and profound development of RL algorithms. However, the existing environments, which can be divided into real world games and customized toy environments, have obvious shortcomings. For real world games, it is designed for human entertainment, and too much difficult for most of RL researchers. For customized toy environments, there is no widely accepted unified evaluation standard for all RL algorithms. Therefore, we introduce the first virtual user-friendly environment framework for RL. In this framework, the environment can be easily configured to realize all kinds of RL tasks in the mainstream research. Then all the mainstream state-of-the-art(SOTA) RL algorithms can be conveniently evaluated and compared. Therefore, our contributions mainly includes the following aspects: 1.single configured environment for all classification of SOTA RL algorithms; 2.combined environment of more than one classification RL algorithms; 3.the evaluation standard for all kinds of RL algorithms. With all these efforts, a possibility for breeding an AI with capability of general competency in a variety of tasks is provided, and maybe it will open up a new chapter for AI.

【6】 Reinforcement Learning for Battery Energy Storage Dispatch augmented with Model-based Optimizer 标题：基于模型优化的强化学习在蓄电池储能调度中的应用链接：https://arxiv.org/abs/2109.01659

作者：Gayathri Krishnamoorthy,Anamika Dubey 机构：Assefaw H. Gebremedhin, Member, IEEE 摘要：强化学习在解决配电系统最优潮流（OPF）问题中非常有用。然而，使用基本上无模型的强化学习算法完全忽略了基于物理的电网建模，这会损害优化器的性能，并带来可扩展性挑战。本文提出了一种新的方法，将基于物理的模型与基于学习的算法协同地结合起来，使用模拟学习来解决分布级OPF问题。具体而言，我们提出了基于模仿学习的深度强化学习（DRL）方法改进，以解决配电系统中电池存储调度的特定情况下的OPF问题。该模拟学习算法利用线性化的基于模型的OPF求解器得到的近似最优解，为DRL算法提供了良好的初始策略，同时提高了训练效率。利用IEEE 34母线和123母线配电馈线和多个配电级蓄电池储能系统，证明了该方法的有效性。摘要：Reinforcement learning has been found useful in solving optimal power flow (OPF) problems in electric power distribution systems. However, the use of largely model-free reinforcement learning algorithms that completely ignore the physics-based modeling of the power grid compromises the optimizer performance and poses scalability challenges. This paper proposes a novel approach to synergistically combine the physics-based models with learning-based algorithms using imitation learning to solve distribution-level OPF problems. Specifically, we propose imitation learning based improvements in deep reinforcement learning (DRL) methods to solve the OPF problem for a specific case of battery storage dispatch in the power distribution systems. The proposed imitation learning algorithm uses the approximate optimal solutions obtained from a linearized model-based OPF solver to provide a good initial policy for the DRL algorithms while improving the training efficiency. The effectiveness of the proposed approach is demonstrated using IEEE 34-bus and 123-bus distribution feeders with numerous distribution-level battery storage systems.

分层学习(1篇)

【1】 Hierarchical 3D Feature Learning for Pancreas Segmentation 标题：层次化三维特征学习在胰腺分割中的应用链接：https://arxiv.org/abs/2109.01667

作者：Federica Proietto Salanitri,Giovanni Bellitto,Ismail Irmakci,Simone Palazzo,Ulas Bagci,Concetto Spampinato 机构： PeRCeiVe Lab, University of Catania, Italy., CE, Ege University, Izmir, Turkey, Department of Radiology and BME, Northwestern University, Chicago, IL, USA 摘要：我们提出了一种新的3D全卷积深度网络，用于从MRI和CT扫描中自动分割胰腺。更具体地说，该模型由一个3D编码器组成，该编码器学习在不同尺度下提取体积特征；在编码器层次结构的不同点获取的特征然后被发送到多个3D解码器，这些解码器分别预测中间分割图。最后，将所有分割图合并以获得唯一的详细分割模板。我们在CT和MRI成像数据上测试了我们的模型：公开的NIH胰腺CT数据集（包括82个对比增强CT）和私有的MRI数据集（包括40个MRI扫描）。实验结果表明，我们的模型在CT胰腺分割方面优于现有方法，平均Dice分数约为88%，并且在极具挑战性的MRI数据集（平均Dice分数约为77%）上具有良好的分割性能。额外的控制实验表明，所获得的性能是由于我们的3D全卷积深度网络和分层表示解码的结合，从而证实了我们的架构设计。摘要：We propose a novel 3D fully convolutional deep network for automated pancreas segmentation from both MRI and CT scans. More specifically, the proposed model consists of a 3D encoder that learns to extract volume features at different scales; features taken at different points of the encoder hierarchy are then sent to multiple 3D decoders that individually predict intermediate segmentation maps. Finally, all segmentation maps are combined to obtain a unique detailed segmentation mask. We test our model on both CT and MRI imaging data: the publicly available NIH Pancreas-CT dataset (consisting of 82 contrast-enhanced CTs) and a private MRI dataset (consisting of 40 MRI scans). Experimental results show that our model outperforms existing methods on CT pancreas segmentation, obtaining an average Dice score of about 88%, and yields promising segmentation performance on a very challenging MRI data set (average Dice score is about 77%). Additional control experiments demonstrate that the achieved performance is due to the combination of our 3D fully-convolutional deep network and the hierarchical representation decoding, thus substantiating our architectural design.

医学相关(14篇)

【1】 Severity and Mortality Prediction Models to Triage Indian COVID-19 Patients 标题：用于分诊印度冠状病毒患者的严重程度和死亡率预测模型链接：https://arxiv.org/abs/2109.02485

作者：Samarth Bhatia,Yukti Makhija,Shalendra Singh,Ishaan Gupta 机构：Affiliations, Department of Chemical engineering, Indian Institute of Technology, Delhi, Hauz Khas, New, Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology, Delhi, Hauz Khas, New Delhi-,. 备注：31 pages, 6 figures, 8 tables. The first two authors (SB and YM) have equal contribution. IG is the corresponding author (ishaan@iitd.ac.in) 摘要：随着印度第二波疫情的缓解，新冠病毒-19已经在全国范围内感染了约2900万患者，导致35万多人死亡。随着感染人数激增，该国医疗基础设施的压力变得明显。虽然该国为其人口接种疫苗，但开放经济可能导致感染率上升。在这种情况下，必须通过基于临床参数的知情患者分诊系统有效利用有限的医院资源。在这里，我们提出了两个可解释的机器学习模型，根据入院当天对印度最大患者队列之一的血液参数进行常规无创监测，预测患者的临床结果、严重程度和死亡率。患者严重程度和死亡率预测模型的准确率分别为86.3%和88.06%，AUC-ROC为0.91和0.92。我们已将这两种模型集成到用户友好的web应用计算器中，https://triage-COVID-19.herokuapp.com/，以展示大规模部署此类工作的潜力。摘要：As the second wave in India mitigates, COVID-19 has now infected about 29 million patients countrywide, leading to more than 350 thousand people dead. As the infections surged, the strain on the medical infrastructure in the country became apparent. While the country vaccinates its population, opening up the economy may lead to an increase in infection rates. In this scenario, it is essential to effectively utilize the limited hospital resources by an informed patient triaging system based on clinical parameters. Here, we present two interpretable machine learning models predicting the clinical outcomes, severity, and mortality, of the patients based on routine non-invasive surveillance of blood parameters from one of the largest cohorts of Indian patients at the day of admission. Patient severity and mortality prediction models achieved 86.3% and 88.06% accuracy, respectively, with an AUC-ROC of 0.91 and 0.92. We have integrated both the models in a user-friendly web app calculator, https://triage-COVID-19.herokuapp.com/, to showcase the potential deployment of such efforts at scale.

【2】 A Dynamic Topic Identification and Labeling Approach of COVID-19 Tweets 标题：一种冠状病毒推文的动态话题识别和标注方法链接：https://arxiv.org/abs/2109.02462

作者：Khandaker Tayef Shahriar,Iqbal H. Sarker,Muhammad Nazrul Islam,Mohammad Ali Moni 机构：Department of Computer Science & Engineering, Chittagong University of Engineering &, Technology, Chittagong-, Bangladesh., Department of Computer Science and Engineering, Military Institute of Science and, Technology, Dhaka-, Bangladesh. 摘要：本文阐述了从新冠病毒19号推文中动态识别具有适当标签的关键主题的问题，以提供更广泛公众舆论的概述。如今，社交媒体是通过互联网技术将人们联系起来的最佳方式之一，它也被认为是我们日常生活中不可或缺的一部分。2019年12月底，新型冠状病毒CoVID-19爆发，世界卫生组织宣布急诊科，因为它在世界各地迅速传播。新冠肺炎已经影响到全球许多人对社交媒体的使用。推特是最具影响力的社交媒体服务之一，在这场疫情中，推特的使用量急剧增加。因此，从新冠病毒19的推文中动态提取带有标签的特定主题对于突出对话而不是手动主题标签方法来说是一个具有挑战性的问题。在本文中，我们提出了一个框架，该框架使用潜在Dirichlet分配（LDA）生成主题的方面术语簇的顶部单格特征，自动识别推文中带有标签的关键主题。我们的实验结果表明，这种动态主题识别和标记方法是有效的，与手动静态方法相比，准确率为85.48\%。摘要：This paper formulates the problem of dynamically identifying key topics with proper labels from COVID-19 Tweets to provide an overview of wider public opinion. Nowadays, social media is one of the best ways to connect people through Internet technology, which is also considered an essential part of our daily lives. In late December 2019, an outbreak of the novel coronavirus, COVID-19 was reported, and the World Health Organization declared an emergency due to its rapid spread all over the world. The COVID-19 epidemic has affected the use of social media by many people across the globe. Twitter is one of the most influential social media services, which has seen a dramatic increase in its use from the epidemic. Thus dynamic extraction of specific topics with labels from tweets of COVID-19 is a challenging issue for highlighting conversation instead of manual topic labeling approach. In this paper, we propose a framework that automatically identifies the key topics with labels from the tweets using the top Unigram feature of aspect terms cluster from Latent Dirichlet Allocation (LDA) generated topics. Our experiment result shows that this dynamic topic identification and labeling approach is effective having the accuracy of 85.48\% with respect to the manual static approach.

【3】 Parkinson's Disease Diagnosis based on Gait Cycle Analysis Through an Interpretable Interval Type-2 Neuro-Fuzzy System 标题：基于可解释区间二型神经模糊系统步态周期分析的帕金森病诊断链接：https://arxiv.org/abs/2109.02442

作者：Armin Salimi-Badr,Mohammad Hashemi,Hamidreza Saffari 机构：Shahid Beheshti University, Tehran, Iran 摘要：本文在分析步态周期的基础上，提出了一种基于区间2型模糊神经网络的可解释分类器，用于检测帕金森病患者。该方法利用从垂直地面反作用力（vGRF）中提取的临床特征，通过放置在受试者鞋底的16个可穿戴传感器测量，并学习可解释的模糊规则。因此，专家可以在调查可解释模糊规则的射击强度的基础上，验证所提出的方法做出的决策。此外，专家还可以利用提取的模糊规则对患者进行诊断或根据自己的知识进行调整。为了提高该方法对不确定性和噪声传感器测量的鲁棒性，采用了区间2型模糊逻辑。为了学习模糊规则，提出了两种范例：1-基于聚类可用样本的批量学习方法用于提取初始模糊规则，2-提出了一种互补在线学习方法来改进遇到新标记样本的规则库。评估该方法在不同条件下（包括存在噪声或观察新实例）对患者和健康受试者进行分类的性能。此外，还将该模型的性能与以前的一些有监督和无监督机器学习方法进行了比较。该方法的最终准确度、精密度、召回率和F1得分分别为88.74%、89.41%、95.10%和92.16%。最后，报告每个特征的提取模糊集。摘要：In this paper, an interpretable classifier using an interval type-2 fuzzy neural network for detecting patients suffering from Parkinson's Disease (PD) based on analyzing the gait cycle is presented. The proposed method utilizes clinical features extracted from the vertical Ground Reaction Force (vGRF), measured by 16 wearable sensors placed in the soles of subjects' shoes and learns interpretable fuzzy rules. Therefore, experts can verify the decision made by the proposed method based on investigating the firing strength of interpretable fuzzy rules. Moreover, experts can utilize the extracted fuzzy rules for patient diagnosing or adjust them based on their knowledge. To improve the robustness of the proposed method against uncertainty and noisy sensor measurements, Interval Type-2 Fuzzy Logic is applied. To learn fuzzy rules, two paradigms are proposed: 1- A batch learning approach based on clustering available samples is applied to extract initial fuzzy rules, 2- A complementary online learning is proposed to improve the rule base encountering new labeled samples. The performance of the method is evaluated for classifying patients and healthy subjects in different conditions including the presence of noise or observing new instances. Moreover, the performance of the model is compared to some previous supervised and unsupervised machine learning approaches. The final Accuracy, Precision, Recall, and F1 Score of the proposed method are 88.74%, 89.41%, 95.10%, and 92.16%. Finally, the extracted fuzzy sets for each feature are reported.

【4】 Developing and validating multi-modal models for mortality prediction in COVID-19 patients: a multi-center retrospective study 标题：建立和验证预测冠状病毒患者死亡率的多模式模型：一项多中心回顾性研究链接：https://arxiv.org/abs/2109.02439

作者：Joy Tzung-yu Wu,Miguel Ángel Armengol de la Hoz,Po-Chih Kuo,Joseph Alexander Paguio,Jasper Seth Yao,Edward Christopher Dee,Wesley Yeung,Jerry Jurado,Achintya Moulick,Carmelo Milazzo,Paloma Peinado,Paula Villares,Antonio Cubillo,José Felipe Varona,Hyung-Chul Lee,Alberto Estirado,José Maria Castellano,Leo Anthony Celi 机构： Department of Radiology and Nuclear Medicine, Stanford University, Palo Alto, CA, United States., Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, United 摘要：由新冠病毒-19大流行带来的前所未有的全球危机引发了许多努力，以帮助卫生系统分配资源为目标，创建用于检测和预测SARS-CoV-2感染的预测模型。特别是机器学习模型，其利用患者临床信息和医学图像进行预测的能力具有很大的潜力。然而，由于方法上的缺陷和缺乏适当的验证，迄今为止大多数已发表的新冠病毒-19预测模型几乎没有临床应用价值。在本文中，我们描述了利用多中心患者数据开发和验证新冠病毒-19死亡率预测的多模式模型的方法。使用来自西班牙马德里的回顾性数据（N=2547）开发了新冠病毒-19死亡率预测模型，并在来自美国新泽西州一家社区医院（N=242）和韩国首尔一家学术中心（N=336）的患者队列中进行了外部验证。我们开发的模型在不同的临床环境中表现不同，强调了在采用机器学习进行临床决策时需要指导策略。我们证明，使用结构化电子健康记录和胸部X射线成像数据的特征可以在所有三个数据集（接收器工作特征曲线下的区域：0.85（95%置信区间：0.83-0.87）、0.76（0.70-0.82）和0.95（0.92-0.98））中获得更好的30天死亡率预测性能。我们讨论了在开发模型的每个步骤中所做决策的基本原理，并向研究社区提供了我们的代码。我们采用最佳机器学习实践进行临床模型开发。我们的目标是创建一个工具包，帮助调查人员和组织构建用于预测、分类和/或优化的多模式模型。摘要：The unprecedented global crisis brought about by the COVID-19 pandemic has sparked numerous efforts to create predictive models for the detection and prognostication of SARS-CoV-2 infections with the goal of helping health systems allocate resources. Machine learning models, in particular, hold promise for their ability to leverage patient clinical information and medical images for prediction. However, most of the published COVID-19 prediction models thus far have little clinical utility due to methodological flaws and lack of appropriate validation. In this paper, we describe our methodology to develop and validate multi-modal models for COVID-19 mortality prediction using multi-center patient data. The models for COVID-19 mortality prediction were developed using retrospective data from Madrid, Spain (N=2547) and were externally validated in patient cohorts from a community hospital in New Jersey, USA (N=242) and an academic center in Seoul, Republic of Korea (N=336). The models we developed performed differently across various clinical settings, underscoring the need for a guided strategy when employing machine learning for clinical decision-making. We demonstrated that using features from both the structured electronic health records and chest X-ray imaging data resulted in better 30-day-mortality prediction performance across all three datasets (areas under the receiver operating characteristic curves: 0.85 (95% confidence interval: 0.83-0.87), 0.76 (0.70-0.82), and 0.95 (0.92-0.98)). We discuss the rationale for the decisions made at every step in developing the models and have made our code available to the research community. We employed the best machine learning practices for clinical model development. Our goal is to create a toolkit that would assist investigators and organizations in building multi-modal models for prediction, classification and/or optimization.

【5】 Less is More: Lighter and Faster Deep Neural Architecture for Tomato Leaf Disease Classification 标题：少即是多：用于番茄叶部病害分类的更轻、更快的深层神经结构链接：https://arxiv.org/abs/2109.02394

作者：Sabbir Ahmed,Md. Bakhtiar Hasan,Tasnim Ahmed,Redwan Karim Sony,Md. Hasanul Kabir 机构：Department of Computer Science and Engineering, Islamic University of Technology, Dhaka, Bangladesh 备注：31 pages, 11 figures, 5 tables, Submitted to Computer and Electronics in Agriculture 摘要：为了确保全球粮食安全和利益相关者的整体利益，正确检测和分类植物疾病的重要性至关重要。在这方面，基于深度学习的图像分类的出现引入了大量的解决方案。然而，这些解决方案在低端设备中的适用性需要快速、准确且计算成本低廉的系统。本研究提出了一种基于轻量级转移学习的番茄叶片病害检测方法。它利用一种有效的预处理方法，通过光照校正增强叶片图像，以改进分类。我们的系统使用一个组合模型提取特征，该模型由一个预训练的MobileNetV2体系结构和一个分类器网络组成，用于有效预测。传统的扩充方法被运行时扩充所取代，以避免数据泄漏并解决类不平衡问题。对PlantVillage数据集的番茄叶片图像的评估表明，所提出的体系结构在模型大小为9.60MB和4.87M浮点运算的情况下达到了99.30%的准确率，使其适合于低端设备中的实际应用。我们的代码和模型将在发布时提供。摘要：To ensure global food security and the overall profit of stakeholders, the importance of correctly detecting and classifying plant diseases is paramount. In this connection, the emergence of deep learning-based image classification has introduced a substantial number of solutions. However, the applicability of these solutions in low-end devices requires fast, accurate, and computationally inexpensive systems. This work proposes a lightweight transfer learning-based approach for detecting diseases from tomato leaves. It utilizes an effective preprocessing method to enhance the leaf images with illumination correction for improved classification. Our system extracts features using a combined model consisting of a pretrained MobileNetV2 architecture and a classifier network for effective prediction. Traditional augmentation approaches are replaced by runtime augmentation to avoid data leakage and address the class imbalance issue. Evaluation on tomato leaf images from the PlantVillage dataset shows that the proposed architecture achieves 99.30% accuracy with a model size of 9.60MB and 4.87M floating-point operations, making it a suitable choice for real-life applications in low-end devices. Our codes and models will be made available upon publication.

【6】 Fairness via AI: Bias Reduction in Medical Information 标题：通过人工智能实现公平：减少医疗信息中的偏见链接：https://arxiv.org/abs/2109.02202

作者：Shiri Dori-Hacohen,Roberto Montenegro,Fabricio Murai,Scott A. Hale,Keen Sung,Michela Blain,Jennifer Edwards-Johnson 机构：Additional Key Words and Phrases: fairness in AI, health misinformation, bias reduction 备注：To appear in: The 4th FAccTRec Workshop on Responsible Recommendation at RecSys 2021 摘要：人工智能研究中的大多数公平性都集中在暴露人工智能系统中的偏见上。从更广泛的角度来看公平，人工智能可以服务于更大的愿望：从根源上铲除社会不平等。具体而言，我们关注健康信息中的不平等，并旨在使用人工智能减少该领域的偏见。在搜索引擎和社交媒体的掩护下，人工智能算法（其中许多算法基于推荐系统）对在线医疗卫生信息的质量产生了巨大的影响。因此，在这些在线提供医疗和健康内容的推荐系统中嵌入偏差检测和减少可能会对患者结果和福祉产生巨大的积极影响。在这篇论文中，我们做出了以下贡献：（1）受医学教育、社会学和反种族主义的启发，我们提出了一个通过人工智能实现公平的新框架(2）我们定义了一个新的术语，bisminformation，它与错误信息相关，但不同于错误信息，并鼓励研究人员对其进行研究(3）我们建议使用人工智能来研究、检测和缓解偏颇、有害和/或虚假的健康信息，这些信息对社会中的少数群体造成了不成比例的伤害；（4）我们提出了几个支柱，并提出了几个开放性问题，以便在这一新空间中开展调查。虽然这项工作的第（3）部分特别关注健康领域，但通过人工智能减少偏见和公平的研究工作所产生的基础计算机科学进步和贡献在社会的所有领域都有广泛的影响。摘要：Most Fairness in AI research focuses on exposing biases in AI systems. A broader lens on fairness reveals that AI can serve a greater aspiration: rooting out societal inequities from their source. Specifically, we focus on inequities in health information, and aim to reduce bias in that domain using AI. The AI algorithms under the hood of search engines and social media, many of which are based on recommender systems, have an outsized impact on the quality of medical and health information online. Therefore, embedding bias detection and reduction into these recommender systems serving up medical and health content online could have an outsized positive impact on patient outcomes and wellbeing. In this position paper, we offer the following contributions: (1) we propose a novel framework of Fairness via AI, inspired by insights from medical education, sociology and antiracism; (2) we define a new term, bisinformation, which is related to, but distinct from, misinformation, and encourage researchers to study it; (3) we propose using AI to study, detect and mitigate biased, harmful, and/or false health information that disproportionately hurts minority groups in society; and (4) we suggest several pillars and pose several open problems in order to seed inquiry in this new space. While part (3) of this work specifically focuses on the health domain, the fundamental computer science advances and contributions stemming from research efforts in bias reduction and Fairness via AI have broad implications in all areas of society.

【7】 Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment 标题：字区域对齐提高胸部X线与放射学报告的联合学习链接：https://arxiv.org/abs/2109.01949

作者：Zhanghexuan Ji,Mohammad Abuzar Shaikh,Dana Moukheiber,Sargur Srihari,Yifan Peng,Mingchen Gao 机构： Department of Computer Science and Engineering, University at Buffalo, The State University of New York, Buffalo, NY, USA, Population Health Sciences, Weill Cornell Medicine, New York, NY, USA 备注：10 Pages, 1 Figure, 3 Tables, Accepted in 12th Machine Learning in Medical Imaging (MLMI 2021) workshop 摘要：自我监督学习提供了一个机会，探索未标记的胸部X光片及其相关的自由文本报告，这些报告在临床常规中积累，无需人工监督。本文提出了一个联合图像文本表示学习网络（JoImTeRNet），用于胸部X射线图像及其放射学报告的预训练。该模型在全局图像句子水平和局部图像区域单词水平上进行预训练，以进行视觉文本匹配。两者都受到基于交叉熵和基于排名的三重匹配损失的双向约束。区域词匹配是使用注意机制计算的，不需要直接监督它们的映射。预先训练的多模态表示学习为涉及图像和/或文本编码的下游任务铺平了道路。我们通过在两个数据集：OpenI IU和MIMIC-CXR上的跨模态检索和多标签分类来证明表示学习的质量摘要：Self-supervised learning provides an opportunity to explore unlabeled chest X-rays and their associated free-text reports accumulated in clinical routine without manual supervision. This paper proposes a Joint Image Text Representation Learning Network (JoImTeRNet) for pre-training on chest X-ray images and their radiology reports. The model was pre-trained on both the global image-sentence level and the local image region-word level for visual-textual matching. Both are bidirectionally constrained on Cross-Entropy based and ranking-based Triplet Matching Losses. The region-word matching is calculated using the attention mechanism without direct supervision about their mapping. The pre-trained multi-modal representation learning paves the way for downstream tasks concerning image and/or text encoding. We demonstrate the representation learning quality by cross-modality retrievals and multi-label classifications on two datasets: OpenI-IU and MIMIC-CXR

【8】 Customer 360-degree Insights in Predicting Chronic Diabetes 标题：顾客360度洞察预测慢性糖尿病链接：https://arxiv.org/abs/2109.01863

作者：Asish Satpathy,Satyajit Behari 机构：a W. P. Carey School of Business, Arizona State University, Tempe, AZ, USA, b Univar Solutions Inc., Downers Grove, IL, USA, Key Words:, Data Mining in Health Care; Classification Analysis with Lifestyle and Demographic Data; Customers 备注：Submitted to journal for publication 摘要：糖尿病等慢性病在世界上相当普遍，每年造成大量死亡。此外，此类慢性病的治疗费用也很高。然而，研究表明，在降低这些医疗成本的同时，可以主动管理和预防糖尿病。我们对代表美国德克萨斯州的1000万客户的360度数据进行了样本挖掘，截至2018年底，这些数据的属性是最新的。从市场研究数据供应商处获得的样本有1000多个客户属性，包括人口统计、生活方式，在某些情况下还包括自我报告的慢性病。在这项研究中，我们开发了一种预测慢性糖尿病的分类模型，准确率为80%。我们演示了一个用例，其中大量360度客户数据可用于预测并因此主动预防糖尿病等慢性病。摘要：Chronic diseases such as diabetes are quite prevalent in the world and are responsible for a significant number of deaths per year. In addition, treatments for such chronic diseases account for a high healthcare cost. However, research has shown that diabetes can be proactively managed and prevented while lowering these healthcare costs. We have mined a sample of ten million customers' 360-degree data representing the state of Texas, USA, with attributes current as of late 2018. The sample received from a market research data vendor has over 1000 customer attributes consisting of demography, lifestyle, and in some cases self-reported chronic conditions. In this study, we have developed a classification model to predict chronic diabetes with an accuracy of 80%. We demonstrate a use case where a large volume of 360-degree customer data can be useful to predict and hence proactively prevent chronic diseases such as diabetes.

【9】 Multimodal Detection of COVID-19 Symptoms using Deep Learning & Probability-based Weighting of Modes 标题：基于深度学习和概率加权的冠状病毒症状多模态检测链接：https://arxiv.org/abs/2109.01669

作者：Meysam Effati,Yu-Chen Sun,Hani E. Naguib,Goldie Nejat 机构：IEEE, Autonomous Systems and Biomechatronics Laboratory (ASBLab), Toronto Smart Materials and Structures (TSMART), Department of Mechanical and Industrial Engineering, University of Toronto, Toronto Rehabilitation Institute 备注：5 Pages, 1 Figure, To appear in The 7th International Conference on Wireless and Mobile Computing, Networking and Communication (IEEE eHPWAS - 17th IEEE WiMob - rank B), 2021 摘要：新冠病毒-19大流行是21世纪最具挑战性的医疗危机之一。随着该病毒继续在全球范围内传播，大部分努力都集中在疫苗的开发和公众的大规模免疫上。虽然每日病例数呈下降趋势，但新病毒突变和变异的出现仍然构成重大威胁。随着经济开始复苏，社会开始开放，人们重新进入办公楼、学校和商场，我们仍然需要有能力检测并尽量减少新冠病毒的传播。新冠病毒感染者可能会出现咳嗽、发烧和呼吸急促等多种症状。许多现有的检测技术关注具有相同重要性的症状。然而，研究表明，一些症状比其他症状更普遍。在本文中，我们提出了一种多模态方法来预测新冠病毒-19，方法是结合使用卷积神经网络的现有深度学习分类器和我们新的基于概率的加权函数，该加权函数考虑了每种症状的流行情况。这些实验是在现有的数据集上进行的，涉及咳嗽、发烧和呼吸急促的三种模式。结果表明，与同等加权函数相比，使用我们的加权函数在检测新冠病毒-19方面有显著的改进。摘要：The COVID-19 pandemic is one of the most challenging healthcare crises during the 21st century. As the virus continues to spread on a global scale, the majority of efforts have been on the development of vaccines and the mass immunization of the public. While the daily case numbers were following a decreasing trend, the emergent of new virus mutations and variants still pose a significant threat. As economies start recovering and societies start opening up with people going back into office buildings, schools, and malls, we still need to have the ability to detect and minimize the spread of COVID-19. Individuals with COVID-19 may show multiple symptoms such as cough, fever, and shortness of breath. Many of the existing detection techniques focus on symptoms having the same equal importance. However, it has been shown that some symptoms are more prevalent than others. In this paper, we present a multimodal method to predict COVID-19 by incorporating existing deep learning classifiers using convolutional neural networks and our novel probability-based weighting function that considers the prevalence of each symptom. The experiments were performed on an existing dataset with respect to the three considered modes of coughs, fever, and shortness of breath. The results show considerable improvements in the detection of COVID-19 using our weighting function when compared to an equal weighting function.

【10】 Artificial Intelligence in Dry Eye Disease 标题：人工智能在干眼病中的应用链接：https://arxiv.org/abs/2109.01658

作者：Andrea M. Storås,Inga Strümke,Michael A. Riegler,Jakob Grauslund,Hugo L. Hammer,Anis Yazidi,Pål Halvorsen,Kjell G. Gundersen,Tor P. Utheim,Catherine Jackson 机构：SimulaMet, Oslo, Norway, Department of Ophthalmology, Odense University Hospital, Odense, Denmark, Department of Computer Science, Oslo Metropolitan University, Norway, Department of Medical Biochemistry, Oslo University Hospital, Norway 摘要：干眼病（DED）的患病率在5%到50%之间，这取决于所使用的诊断标准和研究人群。然而，它仍然是眼科最未得到充分诊断和治疗的疾病之一。许多用于诊断DED的测试依赖于有经验的观察者进行图像解读，这可能被认为是主观的，并导致诊断的变化。由于人工智能（AI）系统能够解决高级问题，因此使用此类技术可以实现更客观的诊断。虽然“AI”一词被广泛使用，但其在医学上的应用最近取得的成功主要归功于机器学习子领域的进步，机器学习已被用于自动分类图像和预测医疗结果。强大的机器学习技术已被用来理解患者数据和医学图像中的细微差别，旨在实现疾病严重程度的一致诊断和分层。这是第一篇关于人工智能在DED中应用的文献综述。我们简要介绍人工智能，报告其在DED研究中的应用现状及其在临床上的应用潜力。我们的综述发现，人工智能已被广泛应用于DED临床试验和研究应用中，主要用于干涉测量、裂隙灯和MEIBOGRAPHIC图像的解释。虽然初步结果是有希望的，但在模型开发、临床试验和标准化方面仍需要做大量工作。摘要：Dry eye disease (DED) has a prevalence of between 5 and 50\%, depending on the diagnostic criteria used and population under study. However, it remains one of the most underdiagnosed and undertreated conditions in ophthalmology. Many tests used in the diagnosis of DED rely on an experienced observer for image interpretation, which may be considered subjective and result in variation in diagnosis. Since artificial intelligence (AI) systems are capable of advanced problem solving, use of such techniques could lead to more objective diagnosis. Although the term `AI' is commonly used, recent success in its applications to medicine is mainly due to advancements in the sub-field of machine learning, which has been used to automatically classify images and predict medical outcomes. Powerful machine learning techniques have been harnessed to understand nuances in patient data and medical images, aiming for consistent diagnosis and stratification of disease severity. This is the first literature review on the use of AI in DED. We provide a brief introduction to AI, report its current use in DED research and its potential for application in the clinic. Our review found that AI has been employed in a wide range of DED clinical tests and research applications, primarily for interpretation of interferometry, slit-lamp and meibography images. While initial results are promising, much work is still needed on model development, clinical testing and standardisation.

【11】 Interpretable Automated Diagnosis of Retinal Disease using Deep OCT Analysis 标题：基于深度OCT分析的可解释性视网膜疾病自动诊断链接：https://arxiv.org/abs/2109.02436

作者：Evan Wen,Max Ehrlich 机构：The Pingry School, University of Maryland, College Park 摘要：每年有3000万光学相干断层扫描（OCT）成像测试用于诊断各种视网膜疾病，但OCT扫描的准确诊断需要训练有素的眼科医生，他们仍然容易做出错误分类。有了更好的诊断系统，许多由视网膜疾病引起的视力丧失病例可以完全避免。在这项工作中，我们开发了一个基于CNN的模型，用于CNV、DME、Drusen和正常OCT扫描的精确分类。此外，我们强调对模型决策的定性和定量解释。我们的类加权效率网B2分类模型的准确率为99.79%。然后，我们制作并分析了OCT扫描中模型聚焦位置的热图。制作热图后，我们创建了模型关注的特定视网膜层的分解图。虽然之前已经开发出了高度精确的模型，但我们的工作是第一次对模型的决策进行详细解释。在我们的工作中，准确性和可解释性的结合可以在临床上用于更好的患者护理。未来的工作可以使用类似的模型对更大、更多样化的数据集进行分类。摘要：30 million Optical Coherence Tomography (OCT) imaging tests are issued every year to diagnose various retinal diseases, but accurate diagnosis of OCT scans requires trained ophthalmologists who are still prone to making misclassifications. With better systems for diagnosis, many cases of vision loss caused by retinal disease could be entirely avoided. In this work, we developed a CNN-based model for accurate classification of CNV, DME, Drusen, and Normal OCT scans. Furthermore, we placed an emphasis on producing both qualitative and quantitative explanations of the model's decisions. Our class-weighted EfficientNet B2 classification model performed at 99.79% accuracy. We then produced and analyzed heatmaps of where in the OCT scan the model focused. After producing the heatmaps, we created breakdowns of the specific retinal layers the model focused on. While highly accurate models have been previously developed, our work is the first to produce detailed explanations of the model's decisions. The combination of accuracy and interpretability in our work can be clinically applied for better patient care. Future work can use a similar model for classification on larger and more diverse data sets.

【12】 Automated detection of COVID-19 cases from chest X-ray images using deep neural network and XGBoost 标题：基于深度神经网络和XGBoost的胸片冠状病毒病例自动检测链接：https://arxiv.org/abs/2109.02428

作者：Hamid Nasiri,Sharif Hasani 机构： Department of Computer Engineering, Amirkabir University of Technology, Tehran, Iran, Electrical and Computer Engineering Department, Semnan University, Semnan, Iran 摘要：2019年末，在全球爆发新冠病毒-19大流行后，许多研究人员和学者试图提供新冠病毒-19病例的检测方法。因此，本研究侧重于从胸部X射线图像中识别新冠病毒-19病例。本文提出了一种从X射线图像诊断冠状病毒病的新方法。在该方法中，使用DenseNet169深度神经网络提取患者胸部X射线图像的特征，然后将提取的特征作为极端梯度增强（XGBoost）算法的输入，以执行分类任务。对所提出方法的评估及其与近年来提出的方法的比较表明，所提出的方法比现有方法更准确、更快，并且在从X射线图像检测新冠病毒-19病例方面具有可接受的性能。摘要：In late 2019 and after COVID-19 pandemic in the world, many researchers and scholars have tried to provide methods for detection of COVID-19 cases. Accordingly, this study focused on identifying COVID-19 cases from chest X-ray images. In this paper, a novel approach to diagnosing coronavirus disease from X-ray images was proposed. In the proposed method, DenseNet169 deep neural network was used to extract the features of X-ray images taken from the patients' chest and the extracted features were then given as input to the Extreme Gradient Boosting (XGBoost) algorithm so that it could perform the classification task. Evaluation of the proposed approach and its comparison with the methods presented in recent years revealed that the proposed method was more accurate and faster than the existing ones and had an acceptable performance in detection of COVID-19 cases from X-ray images.

【13】 A Critical Review of the state-of-the-art on Deep Neural Networks for Blood Glucose Prediction in Patients with Diabetes 标题：深度神经网络在糖尿病患者血糖预测中的研究进展链接：https://arxiv.org/abs/2109.02178

作者：Felix Tena,Oscar Garnica,Juan Lanchares,J. Ignacio Hidalgo 备注：17 pages, 20 figures and 16 tables 摘要：本文比较了最近提出的十种神经网络，并提出了两种基于集成神经网络的血糖预测模型。在相同的数据集、预处理工作流和工具下，使用OhioT1DM数据集在三个不同的预测时间段（30、60和120分钟）对所有这些数据集进行测试。我们使用血糖预测中最常见的指标来比较它们的性能，并使用三种方法对性能最好的算法进行排名，这三种方法是为统计比较多种算法的性能而设计的：scmamp、模型置信集和卓越的预测能力。我们的分析强调了那些具有最高概率成为最佳预测因子的模型，估计了与最佳模型相比表现较差的模型的误差增加，并为其在临床实践中的使用提供了指南。摘要：This article compares ten recently proposed neural networks and proposes two ensemble neural network-based models for blood glucose prediction. All of them are tested under the same dataset, preprocessing workflow, and tools using the OhioT1DM Dataset at three different prediction horizons: 30, 60, and 120 minutes. We compare their performance using the most common metrics in blood glucose prediction and rank the best-performing ones using three methods devised for the statistical comparison of the performance of multiple algorithms: scmamp, model confidence set, and superior predictive ability. Our analysis highlights those models with the highest probability of being the best predictors, estimates the increase in error of the models that perform more poorly with respect to the best ones, and provides a guide for their use in clinical practice.

【14】 How Reliable Are Out-of-Distribution Generalization Methods for Medical Image Segmentation? 标题：非分布泛化方法在医学图像分割中的可靠性如何？链接：https://arxiv.org/abs/2109.01668

作者：Antoine Sanner,Camila Gonzalez,Anirban Mukhopadhyay 机构：Technical University of Darmstadt, Karolinenpl. , Darmstadt, Germany 摘要：深度学习的最新成果依赖于测试数据在分布上与训练数据相似。在理想情况下，深度学习模型将实现分布外（OoD）泛化，即可靠地对分布外数据进行预测。然而在实践中，当面对分布的变化时，模型通常不能很好地概括。因此，设计了几种方法，通过基于正则化或域预测的方案来提高模型学习的特征的鲁棒性。分割医学图像，如海马磁共振成像，对于神经精神疾病的诊断和治疗至关重要。但是，由于患者的年龄和影响器官形状的各种病理因素，这些大脑图像的分布经常会发生变化。在这项工作中，我们使用全监督和半监督训练评估了磁共振数据中海马分割问题的OoD泛化解决方案。我们发现没有一种方法在所有实验中都能可靠地执行。只有V-REx的损失突出，因为它仍然易于调整，而在大多数情况下，它的表现优于标准的U-Net。摘要：The recent achievements of Deep Learning rely on the test data being similar in distribution to the training data. In an ideal case, Deep Learning models would achieve Out-of-Distribution (OoD) Generalization, i.e. reliably make predictions on out-of-distribution data. Yet in practice, models usually fail to generalize well when facing a shift in distribution. Several methods were thereby designed to improve the robustness of the features learned by a model through Regularization- or Domain-Prediction-based schemes. Segmenting medical images such as MRIs of the hippocampus is essential for the diagnosis and treatment of neuropsychiatric disorders. But these brain images often suffer from distribution shift due to the patient's age and various pathologies affecting the shape of the organ. In this work, we evaluate OoD Generalization solutions for the problem of hippocampus segmentation in MR data using both fully- and semi-supervised training. We find that no method performs reliably in all experiments. Only the V-REx loss stands out as it remains easy to tune, while it outperforms a standard U-Net in most cases.

蒸馏|知识提取(1篇)

【1】 Nonparametric Extrema Analysis in Time Series for Envelope Extraction, Peak Detection and Clustering 标题：时间序列非参数极值分析在包络提取、峰值检测和聚类中的应用链接：https://arxiv.org/abs/2109.02082

作者：Kaan Gokcesu,Hakan Gokcesu 摘要：在本文中，我们提出了一种非参数方法，可用于包络提取、峰值突发检测和时间序列聚类。我们的问题形式化导致了时间序列的自然分割/分叉。通过可能的分层实现，它可以用于机器学习、信号处理和数学金融中的各种应用。从输入信号开始，我们的迭代过程通过最小化累积的$L_1$漂移，依次创建两个信号（一个上界信号和一个下界信号）。我们证明了利用类似维特比的路径跟踪算法和最优消去规则可以有效地计算出解。我们考虑许多有趣的设置，其中我们的算法具有接近线性时间复杂度。摘要：In this paper, we propose a nonparametric approach that can be used in envelope extraction, peak-burst detection and clustering in time series. Our problem formalization results in a naturally defined splitting/forking of the time series. With a possibly hierarchical implementation, it can be used for various applications in machine learning, signal processing and mathematical finance. From an incoming input signal, our iterative procedure sequentially creates two signals (one upper bounding and one lower bounding signal) by minimizing the cumulative $L_1$ drift. We show that a solution can be efficiently calculated by use of a Viterbi-like path tracking algorithm together with an optimal elimination rule. We consider many interesting settings, where our algorithm has near-linear time complexities.

推荐(3篇)

【1】 Recommendation System Simulations: A Discussion of Two Key Challenges 标题：推荐系统模拟：两个关键挑战的讨论链接：https://arxiv.org/abs/2109.02475

作者：Allison J. B. Chaney 机构： Duke University 备注：6 pages 摘要：随着推荐系统越来越成为在线平台的标准，模拟为理解这些系统对个人和社会的影响提供了一条途径。在构建推荐系统模拟时，有两个关键挑战：第一，定义用户选择或参与推荐项目的模型；第二，定义一种机制，让用户遇到平台不直接推荐给用户的项目，例如共享特定内容的朋友。本文将深入研究这两个挑战，回顾现有研究中的模拟假设，并提出替代假设。我们还对模拟的局限性进行了更广泛的讨论，并概述了该领域的开放性问题。摘要：As recommendation systems become increasingly standard for online platforms, simulations provide an avenue for understanding the impacts of these systems on individuals and society. When constructing a recommendation system simulation, there are two key challenges: first, defining a model for users selecting or engaging with recommended items and second, defining a mechanism for users encountering items that are not recommended to the user directly by the platform, such as by a friend sharing specific content. This paper will delve into both of these challenges, reviewing simulation assumptions from existing research and proposing alternative assumptions. We also include a broader discussion of the limitations of simulations and outline of open questions in this area.

【2】 Practical and Secure Federated Recommendation with Personalized Masks 标题：实用安全的个性化口罩联合推荐链接：https://arxiv.org/abs/2109.02464

作者：Liu Yang,Ben Tan,Bo Liu,Vincent W. Zheng,Kai Chen,Qiang Yang 机构： Hong Kong University of Science and Technology 备注：18 pages 摘要：联邦推荐是私有分布式推荐系统的一个新概念。它旨在解决数据孤岛和隐私问题。目前的联邦推荐系统主要利用同态加密和差分隐私方法来保护中间计算结果。然而，前者带来了额外的通信和计算成本，后者损害了模型的准确性。两者都不能同时满足推荐系统的实时反馈和准确个性化需求。在本文中，我们提出了一个新的联邦推荐框架，称为联邦屏蔽矩阵分解。联邦掩蔽矩阵分解可以在不牺牲效率的情况下保护联邦推荐系统中的数据隐私。我们没有使用同态加密和差分隐私，而是利用秘密共享技术结合了联邦矩阵分解的安全聚合过程。与同态加密相比，秘密共享大大加快了整个训练过程。此外，我们还引入了个性化掩码的新思想，并将其应用于提出的联邦掩码矩阵分解框架中。一方面，个性化口罩可以进一步提高效率。另一方面，个性化口罩也有助于提高功效。通过实证，我们证明了所设计的模型在不同的真实数据集上的优越性。此外，我们还提供了隐私保证，并讨论了个性化掩码方法在一般联邦学习任务中的扩展。摘要：Federated recommendation is a new notion of private distributed recommender systems. It aims to address the data silo and privacy problems altogether. Current federated recommender systems mainly utilize homomorphic encryption and differential privacy methods to protect the intermediate computational results. However, the former comes with extra communication and computation costs, the latter damages model accuracy. Neither of them could simultaneously satisfy the real-time feedback and accurate personalization requirements of recommender systems. In this paper, we proposed a new federated recommendation framework, named federated masked matrix factorization. Federated masked matrix factorization could protect the data privacy in federated recommender systems without sacrificing efficiency or efficacy. Instead of using homomorphic encryption and differential privacy, we utilize the secret sharing technique to incorporate the secure aggregation process of federated matrix factorization. Compared with homomorphic encryption, secret sharing largely speeds up the whole training process. In addition, we introduce a new idea of personalized masks and apply it in the proposed federated masked matrix factorization framework. On the one hand, personalized masks could further improve efficiency. On the other hand, personalized masks also benefit efficacy. Empirically, we show the superiority of the designed model on different real-world data sets. Besides, we also provide the privacy guarantee and discuss the extension of the personalized mask method to the general federated learning tasks.

【3】 Representation Learning for Efficient and Effective Similarity Search and Recommendation 标题：基于表征学习的高效相似性搜索与推荐链接：https://arxiv.org/abs/2109.01815

作者：Casper Hansen 机构：Representation Learning for Efficient and Effective, Similarity Search and Recommendation, Advisors: Stephen Alstrup, Christina Lioma, Jakob Grue Simonsen, University of Copenhagen, arXiv:,.,v, [cs.IR] , Sep 备注：PhD Thesis, School of The Faculty of Science, University of Copenhagen 摘要：如何表示和操作数据对于构建既有效又高效的计算解决方案至关重要。一种常见的方法是将数据对象表示为二进制向量，表示为\textit{hash code}，它只需要很少的存储空间，并通过直接索引到哈希表或通过在适当的空间中进行相似性计算来实现有效的相似性搜索。由于与实值表示相比，哈希代码的表达能力有限，因此一个核心的公开挑战是如何生成能够使用少量比特很好地捕获语义内容或潜在属性的哈希代码，同时确保哈希代码以不降低其搜索效率的方式分布。最先进的方法使用表示学习来生成此类散列码，重点关注神经自动编码器架构，其中语义通过学习重构散列码的原始输入编码到散列码中。本论文解决了上述挑战，并对表征学习做出了许多贡献：（i）通过比现有技术更具表现力的表征和更有效的相似性度量，即汉明距离，来提高哈希码的有效性，和（ii）通过学习特别适合于搜索方法选择的表示来提高散列码的效率。这些贡献在几个与相似性搜索和推荐相关的任务上得到了实证验证。摘要：How data is represented and operationalized is critical for building computational solutions that are both effective and efficient. A common approach is to represent data objects as binary vectors, denoted \textit{hash codes}, which require little storage and enable efficient similarity search through direct indexing into a hash table or through similarity computations in an appropriate space. Due to the limited expressibility of hash codes, compared to real-valued representations, a core open challenge is how to generate hash codes that well capture semantic content or latent properties using a small number of bits, while ensuring that the hash codes are distributed in a way that does not reduce their search efficiency. State of the art methods use representation learning for generating such hash codes, focusing on neural autoencoder architectures where semantics are encoded into the hash codes by learning to reconstruct the original inputs of the hash codes. This thesis addresses the above challenge and makes a number of contributions to representation learning that (i) improve effectiveness of hash codes through more expressive representations and a more effective similarity measure than the current state of the art, namely the Hamming distance, and (ii) improve efficiency of hash codes by learning representations that are especially suited to the choice of search method. The contributions are empirically validated on several tasks related to similarity search and recommendation.

聚类(1篇)

【1】 Cluster-Promoting Quantization with Bit-Drop for Minimizing Network Quantization Loss 标题：用于最小化网络量化损失的带比特丢弃的簇促进量化链接：https://arxiv.org/abs/2109.02100

作者：Jung Hyun Lee,Jihun Yun,Sung Ju Hwang,Eunho Yang 机构：Korea Advanced Institute of Science and Technology (KAIST),AITRICS 备注：Accepted to ICCV 2021 摘要：网络量化（networkquantization）旨在减少网络权重和激活的比特长度，已经出现在资源有限的设备上。尽管最近的研究已经成功地将全精度网络离散化，但它们在训练后仍会产生较大的量化误差，从而导致全精度网络与其量化网络之间存在显著的性能差距。在这项工作中，我们提出了一种新的神经网络量化方法，即聚类促进量化（CPQ），它可以找到最佳的量化网格，同时自然地鼓励基本的全精度权重在训练期间聚集在这些量化网格周围。CPQ的这一特性归功于我们实现可微量化的两个主要因素：i）在前向传递中使用由特定概率参数化设计的分类分布，以及ii）在后向传递中使用我们提出的多类直通估计器（STE）。由于我们的第二个组件，多类STE，本质上是有偏差的，我们另外提出了一种新的位丢弃技术DropBits，它修改了标准的丢失正则化，以随机丢弃位而不是神经元。作为DropBits的自然扩展，我们进一步介绍了通过对DropBits施加额外的正则化来学习异构量化级别的方法，以便为每一层找到合适的比特长度。我们在各种基准数据集和网络架构上对我们的方法进行了实验验证，并支持一个新的量化假设：学习异构量化级别优于从头开始使用相同但固定量化级别的情况。摘要：Network quantization, which aims to reduce the bit-lengths of the network weights and activations, has emerged for their deployments to resource-limited devices. Although recent studies have successfully discretized a full-precision network, they still incur large quantization errors after training, thus giving rise to a significant performance gap between a full-precision network and its quantized counterpart. In this work, we propose a novel quantization method for neural networks, Cluster-Promoting Quantization (CPQ) that finds the optimal quantization grids while naturally encouraging the underlying full-precision weights to gather around those quantization grids cohesively during training. This property of CPQ is thanks to our two main ingredients that enable differentiable quantization: i) the use of the categorical distribution designed by a specific probabilistic parametrization in the forward pass and ii) our proposed multi-class straight-through estimator (STE) in the backward pass. Since our second component, multi-class STE, is intrinsically biased, we additionally propose a new bit-drop technique, DropBits, that revises the standard dropout regularization to randomly drop bits instead of neurons. As a natural extension of DropBits, we further introduce the way of learning heterogeneous quantization levels to find proper bit-length for each layer by imposing an additional regularization on DropBits. We experimentally validate our method on various benchmark datasets and network architectures, and also support a new hypothesis for quantization: learning heterogeneous quantization levels outperforms the case using the same but fixed quantization levels from scratch.

自动驾驶|车辆|车道检测等(2篇)

【1】 Comparing the Machine Readability of Traffic Sign Pictograms in Austria and Germany 标题：奥地利和德国交通标志象形图的机器可读性比较链接：https://arxiv.org/abs/2109.02362

作者：Alexander Maletzky,Stefan Thumfart,Christoph Wruß 机构： Research Unit Medical Informatics, RISC Software GmbH, ASFINAG Service GmbH 摘要：我们比较了奥地利和德国交通标志上的象形图的机器可读性。为此，我们在合成数据集上训练分类模型，并在受控环境下评估其分类精度。特别是，我们关注两国目前部署的象形图之间的差异，以及一组旨在提高人类可读性的新象形图。除其他结果外，我们发现机器学习模型对于没有经过训练的象形图设计的数据集的泛化能力较差。我们的结论是，高级驾驶员辅助系统（ADAS）制造商必须特别小心，以正确处理当前和新设计的交通标志象形图之间以及来自不同国家的象形图之间的微小视觉差异。摘要：We compare the machine readability of pictograms found on Austrian and German traffic signs. To that end, we train classification models on synthetic data sets and evaluate their classification accuracy in a controlled setting. In particular, we focus on differences between currently deployed pictograms in the two countries, and a set of new pictograms designed to increase human readability. Besides other results, we find that machine-learning models generalize poorly to data sets with pictogram designs they have not been trained on. We conclude that manufacturers of advanced driver-assistance systems (ADAS) must take special care to properly address small visual differences between current and newly designed traffic sign pictograms, as well as between pictograms from different countries.

【2】 Identification of Driver Phone Usage Violations via State-of-the-Art Object Detection with Tracking 标题：通过带跟踪的最新对象检测识别司机电话使用违规链接：https://arxiv.org/abs/2109.02119

作者：Steven Carrell,Amir Atapour-Abarghouei 机构：School of Computing, Newcastle University, Newcastle upon Tyne, United Kingdom 备注：10 pages 摘要：在道路交通事故中，驾驶时使用手机一直是一个主要因素，而捕获此类违规行为的过程可能是一项艰巨的任务。现代目标检测框架和高性能硬件的进步为视频监控的自动化铺平了道路。在这项工作中，我们提出了一种经过定制训练的最先进的物体检测器，它可以与路边摄像头配合使用，以捕获驾驶员的手机使用情况，而无需人工干预。建议的方法还解决了挡风玻璃眩光引起的问题，并介绍了解决此问题所需的步骤。我们使用四种流行的对象检测方法（YOLO、SSD、Faster R-CNN和CenterNet）对12个预先训练的模型进行了微调，以使用我们的自定义数据集。在所有测试的目标探测器中，YLO的精确度最高，达到96%（AP10），帧速率最高可达30 FPS。DeepSort目标跟踪算法还集成到性能最佳的模型中，以仅收集唯一违规的记录，并使所提出的方法能够计算车辆数量。拟议的自动化系统将收集已识别违规的输出图像、每次违规的时间戳和车辆总数。可以通过专门构建的用户界面访问数据。摘要：The use of mobiles phones when driving have been a major factor when it comes to road traffic incidents and the process of capturing such violations can be a laborious task. Advancements in both modern object detection frameworks and high-performance hardware has paved the way for a more automated approach when it comes to video surveillance. In this work, we propose a custom-trained state-of-the-art object detector to work with roadside cameras to capture driver phone usage without the need for human intervention. The proposed approach also addresses the issues caused by windscreen glare and introduces the steps required to remedy this. Twelve pre-trained models are fine-tuned with our custom dataset using four popular object detection methods: YOLO, SSD, Faster R-CNN, and CenterNet. Out of all the object detectors tested, the YOLO yields the highest accuracy levels of up to 96% (AP10) and frame rates of up to ~30 FPS. DeepSort object tracking algorithm is also integrated into the best-performing model to collect records of only the unique violations, and enable the proposed approach to count the number of vehicles. The proposed automated system will collect the output images of the identified violations, timestamps of each violation, and total vehicle count. Data can be accessed via a purpose-built user interface.

联邦学习|隐私保护|加密(4篇)

【1】 Byzantine-Robust Federated Learning via Credibility Assessment on Non-IID Data 标题：基于非IID数据可信度评估的拜占庭-鲁棒联合学习链接：https://arxiv.org/abs/2109.02396

作者：Kun Zhai,Qiang Ren,Junli Wang,Chungang Yan 机构：Key Laboratory of Embedded System and Service Computing (Tongji University), Ministry of Edu-, cation, Shanghai , China. 摘要：联邦学习是一种新的框架，它使资源受限的边缘设备能够联合学习模型，从而解决了数据保护和数据孤岛问题。然而，标准联邦学习容易受到拜占庭式攻击，这将导致全局模型被攻击者操纵或无法收敛。对于非iid数据，目前的方法在防御拜占庭式攻击方面并不有效。在本文中，我们提出了一个基于非iid数据可信度评估的拜占庭式鲁棒联邦学习框架（BRCA）。将自适应异常检测模型与数据验证相结合，设计可信度评估来检测拜占庭攻击。特别地，在异常检测模型中加入了自适应机制，用于模型的训练和预测。同时，给出了一个统一的更新算法，以保证全局模型具有一致的方向。在非iid数据上，我们的实验表明，与传统方法相比，BRCA对拜占庭式攻击更具鲁棒性摘要：Federated learning is a novel framework that enables resource-constrained edge devices to jointly learn a model, which solves the problem of data protection and data islands. However, standard federated learning is vulnerable to Byzantine attacks, which will cause the global model to be manipulated by the attacker or fail to converge. On non-iid data, the current methods are not effective in defensing against Byzantine attacks. In this paper, we propose a Byzantine-robust framework for federated learning via credibility assessment on non-iid data (BRCA). Credibility assessment is designed to detect Byzantine attacks by combing adaptive anomaly detection model and data verification. Specially, an adaptive mechanism is incorporated into the anomaly detection model for the training and prediction of the model. Simultaneously, a unified update algorithm is given to guarantee that the global model has a consistent direction. On non-iid data, our experiments demonstrate that the BRCA is more robust to Byzantine attacks compared with conventional methods

【2】 On Second-order Optimization Methods for Federated Learning 标题：联邦学习的二阶优化方法研究链接：https://arxiv.org/abs/2109.02388

作者：Sebastian Bischoff,Stephan Günnemann,Martin Jaggi,Sebastian U. Stich 机构： Switzer-land 2Technical University of Munich 备注：ICML 2021 Workshop "Beyond first-order methods in ML systems" 摘要：我们考虑联合学习（FL），其中训练数据分布在大量的客户端。此设置中的标准优化方法是联邦平均（FedAvg），它在通信轮之间执行多个局部一阶优化步骤。在这项工作中，我们评估了几种在FL环境中具有局部步长的二阶分布方法的性能，这些方法保证具有良好的收敛性。我们（i）表明，与先前工作的结果相比，FedAvg在公平指标（等量的局部计算）下进行评估时，相对于其二阶竞争对手，其表现出奇地好。基于我们的数值研究，我们提出（ii）一种新的变体，使用二阶局部信息进行更新，并使用全局线搜索来抵消由此产生的局部特异性。摘要：We consider federated learning (FL), where the training data is distributed across a large number of clients. The standard optimization method in this setting is Federated Averaging (FedAvg), which performs multiple local first-order optimization steps between communication rounds. In this work, we evaluate the performance of several second-order distributed methods with local steps in the FL setting which promise to have favorable convergence properties. We (i) show that FedAvg performs surprisingly well against its second-order competitors when evaluated under fair metrics (equal amount of local computations)-in contrast to the results of previous work. Based on our numerical study, we propose (ii) a novel variant that uses second-order local information for updates and a global line search to counteract the resulting local specificity.

【3】 Reconfigurable Intelligent Surface Empowered Over-the-Air Federated Edge Learning 标题：支持空中联合边缘学习的可重构智能曲面链接：https://arxiv.org/abs/2109.02353

作者：Hang Liu,Zehong Lin,Xiaojun Yuan,Ying-Jun Angela Zhang 机构： Zhang are with The Chinese University of HongKong; X, Yuan is with the University of Electronic Science and Technology ofChina 备注：This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible 摘要：联邦边缘学习（FEEL）已成为在6G无线网络边缘开发AI服务的革命性范例，因为它支持在大量移动设备上进行协作模型训练。然而，无线信道上的模型通信，特别是FEEL的上行链路模型上传，已被广泛认为是严重限制FEEL效率的瓶颈。尽管无线计算可以减轻FEEL模型上传过程中无线资源的过度成本，但无线FEEL的实际实现仍然面临一些挑战，包括严重的掉队问题、巨大的通信开销和潜在的隐私泄漏。在这篇文章中，我们将研究无线感觉中的这些挑战，并利用可重构智能表面（RIS）这一未来无线系统的关键技术来解决这些挑战。我们研究了RIS增强感觉的最新解决方案，并探索了采用RIS提高感觉性能的有希望的研究机会。摘要：Federated edge learning (FEEL) has emerged as a revolutionary paradigm to develop AI services at the edge of 6G wireless networks as it supports collaborative model training at a massive number of mobile devices. However, model communication over wireless channels, especially in uplink model uploading of FEEL, has been widely recognized as a bottleneck that critically limits the efficiency of FEEL. Although over-the-air computation can alleviate the excessive cost of radio resources in FEEL model uploading, practical implementations of over-the-air FEEL still suffer from several challenges, including strong straggler issues, large communication overheads, and potential privacy leakage. In this article, we study these challenges in over-the-air FEEL and leverage reconfigurable intelligent surface (RIS), a key enabler of future wireless systems, to address these challenges. We study the state-of-the-art solutions on RIS-empowered FEEL and explore the promising research opportunities for adopting RIS to enhance FEEL performance.

【4】 Fair Federated Learning for Heterogeneous Face Data 标题：异构人脸数据的公平联合学习链接：https://arxiv.org/abs/2109.02351

作者：Samhita Kanaparthy,Manisha Padala,Sankarshan Damle,Sujit Gujar 机构：Machine Learning Lab, IIIT Hyderabad, Hyderabad, India 摘要：我们考虑的问题，实现公平分类联邦学习（FL）在数据异质性。提出的大多数公平分类方法需要代表不同人口群体的不同数据。相比之下，每个客户通常只拥有代表一个人口统计组的数据。因此，现有方法不能用于客户级别的公平分类模型。为了解决这个问题，我们提出了几种聚合技术。我们通过比较CelebA、UTK和FairFace数据集的公平性度量和准确性，对这些技术进行了实证验证。摘要：We consider the problem of achieving fair classification in Federated Learning (FL) under data heterogeneity. Most of the approaches proposed for fair classification require diverse data that represent the different demographic groups involved. In contrast, it is common for each client to own data that represents only a single demographic group. Hence the existing approaches cannot be adopted for fair classification models at the client level. To resolve this challenge, we propose several aggregation techniques. We empirically validate these techniques by comparing the resulting fairness metrics and accuracy on CelebA, UTK, and FairFace datasets.

推理|分析|理解|解释(5篇)

【1】 Multi-Agent Variational Occlusion Inference Using People as Sensors 标题：以人为传感器的多智能体变分遮挡推理链接：https://arxiv.org/abs/2109.02173

作者：Masha Itkina,Ye-Ji Mun,Katherine Driggs-Campbell,Mykel J. Kochenderfer 机构：Stanford University, University of Illinois Urbana-Champaign 备注：21 pages, 11 figures 摘要：自动驾驶车辆必须对城市环境中的空间遮挡进行推理，以确保安全，而不会过于谨慎。先前的工作探索了从观察到的道路交通人员的社会行为中推断出的阻塞。从代理行为推断占用率是一个固有的多模态问题；对于前方不同的占用模式，驾驶员可能会以相同的方式行事（例如，驾驶员可能在交通中或在开阔道路上以恒定速度行驶）。然而，过去的工作并未考虑这种多模态，因此忽略了对驾驶员行为与其环境之间关系中任意不确定性来源的建模。我们提出了一种遮挡推断方法，将观察到的人类智能体行为描述为传感器测量，并将其与标准传感器套件中的行为进行融合。为了捕获任意不确定性，我们训练了一个具有离散潜在空间的条件变分自动编码器，以学习从观察到的驾驶员轨迹到驾驶员前方视图的占用栅格表示的多模态映射。我们的方法处理多智能体场景，使用证据理论结合来自多个观测驱动程序的测量来解决传感器融合问题。我们的方法在真实数据集上得到了验证，优于基线，并展示了实时性能。我们的代码可在https://github.com/sisl/MultiAgentVariationalOcclusionInference . 摘要：Autonomous vehicles must reason about spatial occlusions in urban environments to ensure safety without being overly cautious. Prior work explored occlusion inference from observed social behaviors of road agents. Inferring occupancy from agent behaviors is an inherently multimodal problem; a driver may behave in the same manner for different occupancy patterns ahead of them (e.g., a driver may move at constant speed in traffic or on an open road). Past work, however, does not account for this multimodality, thus neglecting to model this source of aleatoric uncertainty in the relationship between driver behaviors and their environment. We propose an occlusion inference method that characterizes observed behaviors of human agents as sensor measurements, and fuses them with those from a standard sensor suite. To capture the aleatoric uncertainty, we train a conditional variational autoencoder with a discrete latent space to learn a multimodal mapping from observed driver trajectories to an occupancy grid representation of the view ahead of the driver. Our method handles multi-agent scenarios, combining measurements from multiple observed drivers using evidential theory to solve the sensor fusion problem. Our approach is validated on a real-world dataset, outperforming baselines and demonstrating real-time capable performance. Our code is available at https://github.com/sisl/MultiAgentVariationalOcclusionInference .

【2】 SEC4SR: A Security Analysis Platform for Speaker Recognition 标题：SEC4SR：一种说话人识别安全分析平台链接：https://arxiv.org/abs/2109.01766

作者：Guangke Chen,Zhe Zhao,Fu Song,Sen Chen,Lingling Fan,Yang Liu 机构：ShanghaiTech University, Tianjin University, Nankai University, Nanyang Technological University 摘要：对抗性攻击已经扩展到说话人识别（SR）。然而，现有的攻击通常使用不同的SR模型、识别任务和数据集进行评估，并且仅考虑从计算机视觉借用的少数对抗性防御。然而，这些防御措施尚未针对自适应攻击进行全面评估。因此，对于对抗性攻击和防御的优势和局限性仍然缺乏定量的理解。为了保护SR系统，还需要更有效的防御。为了弥补这一差距，我们推出了第一个平台SEC4SR，使研究人员能够系统全面地评估SR中的对抗性攻击和防御。SEC4SR包含4种白盒攻击和2种黑盒攻击，24种防御，包括我们新颖的功能级转换。它还包含用于安装自适应攻击的技术。使用SEC4SR，我们对SR中的对抗性攻击和防御进行了迄今为止规模最大的实证研究，涉及23种防御、15种攻击和4种攻击设置。我们的研究提供了许多有用的发现，这些发现可能会推动未来的研究：例如（1）所有转换都会略微降低良性示例的准确性，并且它们的有效性会随着攻击的不同而变化(2）大多数转换在自适应攻击下变得不那么有效，但一些转换变得更有效(3）很少有转换与对抗性训练相结合，可以对某些但并非所有攻击产生更强的防御，而我们的功能级别转换与对抗性训练相结合，可以对所有攻击产生最强的防御。大量实验证明了SEC4SR的能力和优势，这将有助于SR的未来研究。摘要：Adversarial attacks have been expanded to speaker recognition (SR). However, existing attacks are often assessed using different SR models, recognition tasks and datasets, and only few adversarial defenses borrowed from computer vision are considered. Yet,these defenses have not been thoroughly evaluated against adaptive attacks. Thus, there is still a lack of quantitative understanding about the strengths and limitations of adversarial attacks and defenses. More effective defenses are also required for securing SR systems. To bridge this gap, we present SEC4SR, the first platform enabling researchers to systematically and comprehensively evaluate adversarial attacks and defenses in SR. SEC4SR incorporates 4 white-box and 2 black-box attacks, 24 defenses including our novel feature-level transformations. It also contains techniques for mounting adaptive attacks. Using SEC4SR, we conduct thus far the largest-scale empirical study on adversarial attacks and defenses in SR, involving 23 defenses, 15 attacks and 4 attack settings. Our study provides lots of useful findings that may advance future research: such as (1) all the transformations slightly degrade accuracy on benign examples and their effectiveness vary with attacks; (2) most transformations become less effective under adaptive attacks, but some transformations become more effective; (3) few transformations combined with adversarial training yield stronger defenses over some but not all attacks, while our feature-level transformation combined with adversarial training yields the strongest defense over all the attacks. Extensive experiments demonstrate capabilities and advantages of SEC4SR which can benefit future research in SR.

【3】 Non-Euclidean Analysis of Joint Variations in Multi-Object Shapes 标题：多目标形状关节变化的非欧氏分析链接：https://arxiv.org/abs/2109.02230

作者：Zhiyuan Liu,Jörn Schulz,Mohsen Taheri,Martin Styner,James Damon,Stephen Pizer,J. S. Marron 摘要：本文考虑分类任务中多个功能相关结构的联合分析。特别是，我们开发的方法是由孤独症组和对照组之间功能相关的大脑结构如何共同变化所驱动的。为此，我们设计了一种基于（1）非欧几里德统计量的方法，该统计量可以忠实地表示欧几里德空间中的非欧几里德数据，以及（2）一种非参数综合分析方法，该方法可以将多块欧几里德数据分解为联合、单独和剩余结构。我们发现，所得到的联合结构在识别多块非欧几里德数据联合变化的基本模式方面是有效的、鲁棒的和可解释的。我们验证了该方法对从已发展和未发展为自闭症谱系障碍（ASD）的病例中收集的结构形状数据进行分类。摘要：This paper considers joint analysis of multiple functionally related structures in classification tasks. In particular, our method developed is driven by how functionally correlated brain structures vary together between autism and control groups. To do so, we devised a method based on a novel combination of (1) non-Euclidean statistics that can faithfully represent non-Euclidean data in Euclidean spaces and (2) a non-parametric integrative analysis method that can decompose multi-block Euclidean data into joint, individual, and residual structures. We find that the resulting joint structure is effective, robust, and interpretable in recognizing the underlying patterns of the joint variation of multi-block non-Euclidean data. We verified the method in classifying the structural shape data collected from cases that developed and did not develop into Autistic Spectrum Disorder (ASD).

【4】 Towards high-accuracy deep learning inference of compressible turbulent flows over aerofoils 标题：翼型可压缩湍流流场的高精度深度学习推断链接：https://arxiv.org/abs/2109.02183

作者：Li-Wei Chen,Nils Thuerey 机构：Technical University of Munich, D-, Garching 摘要：本研究利用深层神经网络研究二维可压缩机翼绕流雷诺平均Navier-Stokes解的精确推断。我们的方法产生的网络，学习生成精确的流场，为不同的贴体，结构化网格提供相应的编码映射到规范空间的解决方案。我们将深度神经网络模型应用于随机给定攻角和雷诺数的不可压缩流的基准情况，与以前的工作相比，实现了一个数量级以上的改进。此外，对于跨音速流动情况，深层神经网络模型准确预测了高雷诺数下的复杂流动行为，如激波/边界层相互作用，以及压力系数、表面摩擦系数以及机翼下游的尾流总压分布等定量分布。所提出的深度学习方法大大加快了流场的预测速度，并显示了实现快速气动设计的前景。摘要：The present study investigates the accurate inference of Reynolds-averaged Navier-Stokes solutions for the compressible flow over aerofoils in two dimensions with a deep neural network. Our approach yields networks that learn to generate precise flow fields for varying body-fitted, structured grids by providing them with an encoding of the corresponding mapping to a canonical space for the solutions. We apply the deep neural network model to a benchmark case of incompressible flow at randomly given angles of attack and Reynolds numbers and achieve an improvement of more than an order of magnitude compared to previous work. Further, for transonic flow cases, the deep neural network model accurately predicts complex flow behaviour at high Reynolds numbers, such as shock wave/boundary layer interaction, and quantitative distributions like pressure coefficient, skin friction coefficient as well as wake total pressure profiles downstream of aerofoils. The proposed deep learning method significantly speeds up the predictions of flow fields and shows promise for enabling fast aerodynamic designs.

【5】 Optimal transport weights for causal inference 标题：用于因果推断的最优传输权重链接：https://arxiv.org/abs/2109.01991

作者：Eric Dunipace 机构： 2 1David Geffen School of Medicine at UCLA 2Department of Biostatistics, Chan School of Public HealthAbstractWeighting methods are a common tool to de-bias estimates of causal effects 摘要：加权法是消除因果效应估计偏差的常用工具。尽管有越来越多的看似不同的方法，但其中许多方法可以合并为一个统一的体系：因果最优运输。这种新方法通过最小化治疗组和对照组之间的最佳运输距离，或者更一般地说，源人群和目标人群之间的最佳运输距离，直接以分配平衡为目标。我们的方法是无模型的，但也可以包含矩或研究人员希望平衡的协变量的任何其他重要函数。我们发现，当倾向评分和结果模型都被错误指定时，因果最优传输优于竞争对手方法，这表明它是普通加权方法的稳健替代方法。最后，我们在一项外部对照研究中证明了我们的方法的实用性，该研究考察了米索前列醇与催产素治疗产后出血的效果。摘要：Weighting methods are a common tool to de-bias estimates of causal effects. And though there are an increasing number of seemingly disparate methods, many of them can be folded into one unifying regime: causal optimal transport. This new method directly targets distributional balance by minimizing optimal transport distances between treatment and control groups or, more generally, between a source and target population. Our approach is model-free but can also incorporate moments or any other important functions of covariates that the researcher desires to balance. We find that the causal optimal transport outperforms competitor methods when both the propensity score and outcome models are misspecified, indicating it is a robust alternative to common weighting methods. Finally, we demonstrate the utility of our method in an external control study examining the effect of misoprostol versus oxytocin for treatment of post-partum hemorrhage.

检测相关(2篇)

【1】 Uncovering the Limits of Text-based Emotion Detection 标题：揭开基于文本的情感检测的局限链接：https://arxiv.org/abs/2109.01900

作者：Nurudin Alvarez-Gonzalez,Andreas Kaltenbrunner,Vicenç Gómez 机构：Universitat Pompeu Fabra. Barcelona, Spain. 摘要：从文本中识别情感对于各种现实任务至关重要。我们认为，目前最大的两个语料库的情感分类：GoEngress，58K消息标记的读者，和排气，与33米作家标记的消息。我们设计了一个基准，并评估了几个特征空间和学习算法，包括两个简单但新颖的模型，它们在BERT上的表现优于以前的强基线。通过对人类参与者的实验，我们还分析了作者表达情感的方式与读者感知情感的方式之间的差异。我们的结果表明，作者表达的情感比读者感知的情感更难识别。我们共享一个公共web界面，供研究人员探索我们的模型。摘要：Identifying emotions from text is crucial for a variety of real world tasks. We consider the two largest now-available corpora for emotion classification: GoEmotions, with 58k messages labelled by readers, and Vent, with 33M writer-labelled messages. We design a benchmark and evaluate several feature spaces and learning algorithms, including two simple yet novel models on top of BERT that outperform previous strong baselines on GoEmotions. Through an experiment with human participants, we also analyze the differences between how writers express emotions and how readers perceive them. Our results suggest that emotions expressed by writers are harder to identify than emotions that readers perceive. We share a public web interface for researchers to explore our models.

【2】 Postulating Exoplanetary Habitability via a Novel Anomaly Detection Method 标题：一种新的异常检测方法在行星外宜居性假设中的应用链接：https://arxiv.org/abs/2109.02273

作者：Jyotirmoy Sarkar,Kartik Bhatia,Snehanshu Saha,Margarita Safonova,Santonu Sarkar 机构：CSIS and APPCAIR, BITS Pilani, K. K. Birla Goa Campus, Goa, India, Indian Institute of Astrophysics, Bangalore, India 备注：12 pages, 3 figures, submitted to MNRAS 摘要：随着数千颗系外行星的发现以及我们银河系中数十亿颗系外行星存在的可能性，宇宙学研究发生了深刻的变化。这些搜索的最大目标是，是否还有其他生命藏身于行星之中。然而，在这些被探测到的行星中，哪些是可居住的，哪些可能是可居住的，甚至可能是有人居住的，这个问题仍然没有答案。一些可能宜居的系外行星已经被假设，但由于地球是唯一已知的宜居行星，因此宜居性的测量必须以地球为参考来确定。最近的几项工作引入了基于优化方法的新宜居性指标。另一个新兴的研究领域是利用监督学习对可能宜居的系外行星进行分类。然而，建模和监督学习方法都存在缺陷。我们提出了一种异常检测方法，即多阶段模因算法（MSMA）来检测异常，并将其扩展到一种无监督聚类算法MSMVMCA来使用它检测作为异常的潜在可居住系外行星。该算法基于地球是一个异常的假设，数千个数据点中可能存在少量其他异常。我们描述了一种基于MSMA的聚类方法，该方法使用一种新的距离函数来检测异常（包括地球）的可居住候选。这一结果与行星宜居性实验室（PHL）的宜居系外行星目录（PHL-HEC）交叉匹配，其中包括乐观和保守的潜在宜居系外行星列表。摘要：A profound shift in the study of cosmology came with the discovery of thousands of exoplanets and the possibility of the existence of billions of them in our Galaxy. The biggest goal in these searches is whether there are other life-harbouring planets. However, the question which of these detected planets are habitable, potentially-habitable, or maybe even inhabited, is still not answered. Some potentially habitable exoplanets have been hypothesized, but since Earth is the only known habitable planet, measures of habitability are necessarily determined with Earth as the reference. Several recent works introduced new habitability metrics based on optimization methods. Classification of potentially habitable exoplanets using supervised learning is another emerging area of study. However, both modeling and supervised learning approaches suffer from drawbacks. We propose an anomaly detection method, the Multi-Stage Memetic Algorithm (MSMA), to detect anomalies and extend it to an unsupervised clustering algorithm MSMVMCA to use it to detect potentially habitable exoplanets as anomalies. The algorithm is based on the postulate that Earth is an anomaly, with the possibility of existence of few other anomalies among thousands of data points. We describe an MSMA-based clustering approach with a novel distance function to detect habitable candidates as anomalies (including Earth). The results are cross-matched with the habitable exoplanet catalog (PHL-HEC) of the Planetary Habitability Laboratory (PHL) with both optimistic and conservative lists of potentially habitable exoplanets.

分类|识别(7篇)

【1】 An Enhanced Machine Learning Topic Classification Methodology for Cybersecurity 标题：一种面向网络安全的增强型机器学习主题分类方法链接：https://arxiv.org/abs/2109.02473

作者：Elijah Pelofske,Lorie M. Liebrock,Vincent Urias 机构：Cybersecurity Centers, New Mexico Institute of Mining and Technology, Socorro, New Mexico, USA, Sandia National Laboratories, Albuquerque, New Mexico, USA 摘要：在本研究中，我们使用来自三个互联网文本源（Reddit、Stackexchange、Arxiv）的用户定义标签来训练21种不同的机器学习模型，用于检测自然文本中的网络安全讨论的主题分类任务。我们在交叉验证实验中分析了21个模型的假阳性率和假阴性率。然后，我们提出了一个网络安全主题分类（CTC）工具，它以21个经过训练的机器学习模型中的大多数作为检测网络安全相关文本的决策机制。我们还表明，CTC工具的多数投票机制提供的平均误报率和误报率低于21个单独模型中的任何一个。我们表明，CTC工具可以扩展到数十万个文档，墙上的时钟时间大约为小时。摘要：In this research, we use user defined labels from three internet text sources (Reddit, Stackexchange, Arxiv) to train 21 different machine learning models for the topic classification task of detecting cybersecurity discussions in natural text. We analyze the false positive and false negative rates of each of the 21 model's in a cross validation experiment. Then we present a Cybersecurity Topic Classification (CTC) tool, which takes the majority vote of the 21 trained machine learning models as the decision mechanism for detecting cybersecurity related text. We also show that the majority vote mechanism of the CTC tool provides lower false negative and false positive rates on average than any of the 21 individual models. We show that the CTC tool is scalable to the hundreds of thousands of documents with a wall clock time on the order of hours.

【2】 Visual Recognition with Deep Learning from Biased Image Datasets 标题：基于深度学习的有偏图像数据视觉识别链接：https://arxiv.org/abs/2109.02357

作者：Robin Vogel,Stephan Clémençon,Pierre Laforgue 机构：School of Informatics, University of Edinburgh, Edinburgh, United Kingdom, Stephan Cl´emenc¸on, LTCI, Institut Polytechnique de Paris, Telecom Paris, Palaiseau, France, Department of Computer Science, Universita degli Studi di Milano, Milano, Italy 备注：11 pages, 9 figures, 3 tables 摘要：在实践中，尤其是在训练深层神经网络时，视觉识别规则通常是基于各种信息源学习的。另一方面，最近部署的人脸识别系统在不同人群段上的预测性能参差不齐，突出了可能由图像数据集的简单聚合引起的代表性问题。事实上，采样偏差并不是简单地通过考虑更大的数据集而消失的，忽略其影响可能会完全危及学习到的预测规则的泛化能力。在本文中，我们展示了偏倚模型如何在视觉识别的背景下应用于纠正这些问题。偏倚模型最初是在（Gill等人，1988年）引入非参数估计的，最近在（Laforgue和Clêemen\c{c}on，2019年）从统计学习理论的角度重新审视的。基于工作中偏置机制的（近似）知识，我们的方法包括重新加权观测值，从而形成目标分布的近似减损估计量。我们的方法在理论上有效的一个关键条件是，生成有偏数据集的分布的支持必须重叠，并且覆盖目标分布的支持。为了在实践中满足这一要求，我们建议使用低维图像表示，跨图像数据库共享。最后，我们提供数值实验，强调只要适当选择偏置函数，我们方法的相关性。摘要：In practice, and more especially when training deep neural networks, visual recognition rules are often learned based on various sources of information. On the other hand, the recent deployment of facial recognition systems with uneven predictive performances on different population segments highlights the representativeness issues possibly induced by a naive aggregation of image datasets. Indeed, sampling bias does not vanish simply by considering larger datasets, and ignoring its impact may completely jeopardize the generalization capacity of the learned prediction rules. In this paper, we show how biasing models, originally introduced for nonparametric estimation in (Gill et al., 1988), and recently revisited from the perspective of statistical learning theory in (Laforgue and Cl\'emen\c{c}on, 2019), can be applied to remedy these problems in the context of visual recognition. Based on the (approximate) knowledge of the biasing mechanisms at work, our approach consists in reweighting the observations, so as to form a nearly debiased estimator of the target distribution. One key condition for our method to be theoretically valid is that the supports of the distributions generating the biased datasets at disposal must overlap, and cover the support of the target distribution. In order to meet this requirement in practice, we propose to use a low dimensional image representation, shared across the image databases. Finally, we provide numerical experiments highlighting the relevance of our approach whenever the biasing functions are appropriately chosen.

【3】 Does Melania Trump have a body double from the perspective of automatic face recognition? 标题：从自动人脸识别的角度看，梅拉尼娅·特朗普有没有替身？链接：https://arxiv.org/abs/2109.02283

作者：Khawla Mallat,Fabiola Becerra-Riera,Annette Morales-González,Heydi Méndez-Vázquez,Jean-Luc Dugelay 机构： Digital Security Department, EURECOM Campus Sophia Tech, route des Chappes F-, Biot Sophia, Advanced Technologies Application Center (CENATAV) ,A #, Siboney, Playa, P.C., Havana 摘要：在这篇论文中，我们探讨了自动人脸识别是否有助于验证社交媒体上广泛存在的错误信息，特别是基于身体双倍体存在的阴谋论。本文讨论的阴谋论是梅拉尼娅·特朗普的双重身份案。我们使用了四种不同的最先进的描述符进行人脸识别，以验证所研究的阴谋理论的完整性。此外，我们评估了不同图像质量指标对人脸识别结果变化的影响。考虑了两组图像质量指标：采集相关指标和主题相关指标。摘要：In this paper, we explore whether automatic face recognition can help in verifying widespread misinformation on social media, particularly conspiracy theories that are based on the existence of body doubles. The conspiracy theory addressed in this paper is the case of the Melania Trump body double. We employed four different state-of-the-art descriptors for face recognition to verify the integrity of the claim of the studied conspiracy theory. In addition, we assessed the impact of different image quality metrics on the variation of face recognition results. Two sets of image quality metrics were considered: acquisition-related metrics and subject-related metrics.

【4】 The Phonexia VoxCeleb Speaker Recognition Challenge 2021 System Description 标题：Phonexia VoxCeleb说话人识别挑战赛2021系统描述链接：https://arxiv.org/abs/2109.02052

作者：Josef Slavíček,Albert Swart,Michal Klčo,Niko Brümmer 机构：Phonexia s.r.o, Brno, Czech Republic 备注：System description for VoxSRC-21: Speaker Recognition Challenge 摘要：我们描述了VoxCeleb说话人识别挑战2021（VoxSRC-21）在无监督说话人验证轨道中的Phonexia提交。我们的解决方案与IDLab在VoxSRC-20上的获奖作品非常相似。嵌入提取器是使用动量对比学习引导的，输入增强是唯一的监督来源。然后进行多次聚类迭代，分配伪说话人标签，然后用于监督嵌入提取器训练。最后，通过平均五种不同嵌入提取器的zt归一化余弦分数，进行分数融合。我们还简要描述了不成功的解决方案，包括i向量代替DNN嵌入和PLDA代替余弦评分。摘要：We describe the Phonexia submission for the VoxCeleb Speaker Recognition Challenge 2021 (VoxSRC-21) in the unsupervised speaker verification track. Our solution was very similar to IDLab's winning submission for VoxSRC-20. An embedding extractor was bootstrapped using momentum contrastive learning, with input augmentations as the only source of supervision. This was followed by several iterations of clustering to assign pseudo-speaker labels that were then used for supervised embedding extractor training. Finally, a score fusion was done, by averaging the zt-normalized cosine scores of five different embedding extractors. We briefly also describe unsuccessful solutions involving i-vectors instead of DNN embeddings and PLDA instead of cosine scoring.

【5】 Attentive Neural Controlled Differential Equations for Time-series Classification and Forecasting 标题：用于时间序列分类和预测的注意力神经控制微分方程链接：https://arxiv.org/abs/2109.01876

作者：Sheo Yon Jhin,Heejoo Shin,Seoyoung Hong,Solhee Park,Noseong Park 备注：Accepted in ICDM 2021 摘要：受微分方程启发的神经网络在过去几年中得到了迅速发展。神经常微分方程（NODEs）和神经控制微分方程（NCDEs）是其中两个具有代表性的例子。从理论上讲，NCDE为时间序列数据提供了比节点更好的表示学习能力。特别是，众所周知，NCDE适用于处理不规则的时间序列数据。虽然在采用注意后节点得到了成功的扩展，但是如何将注意整合到NCDE中还没有研究。为此，我们提出了用于时间序列分类和预测的注意神经控制微分方程（ANCDEs）方法，其中使用了双重NCDE：一个用于生成注意值，另一个用于为下游机器学习任务进化隐藏向量。我们用三个真实的时间序列数据集和10条基线进行了实验。在去掉一些值之后，我们还进行了不规则的时间序列实验。我们的方法在所有情况下都以非平凡的裕度一致地显示出最佳的准确性。我们的可视化还表明，所呈现的注意机制通过关注关键信息而起作用。摘要：Neural networks inspired by differential equations have proliferated for the past several years. Neural ordinary differential equations (NODEs) and neural controlled differential equations (NCDEs) are two representative examples of them. In theory, NCDEs provide better representation learning capability for time-series data than NODEs. In particular, it is known that NCDEs are suitable for processing irregular time-series data. Whereas NODEs have been successfully extended after adopting attention, however, it had not been studied yet how to integrate attention into NCDEs. To this end, we present the method of Attentive Neural Controlled Differential Equations (ANCDEs) for time-series classification and forecasting, where dual NCDEs are used: one for generating attention values, and the other for evolving hidden vectors for a downstream machine learning task. We conduct experiments with three real-world time-series datasets and 10 baselines. After dropping some values, we also conduct irregular time-series experiments. Our method consistently shows the best accuracy in all cases by non-trivial margins. Our visualizations also show that the presented attention mechanism works as intended by focusing on crucial information.

【6】 A realistic approach to generate masked faces applied on two novel masked face recognition data sets 标题：一种在两个新的蒙版人脸识别数据集上应用的真实感蒙版人脸生成方法链接：https://arxiv.org/abs/2109.01745

作者：Tudor Mare,Georgian Duta,Mariana-Iuliana Georgescu,Adrian Sandru,Bogdan Alexe,Marius Popescu,Radu Tudor Ionescu 机构：SecurifAI, University of Bucharest 摘要：新冠病毒-19大流行引发了使人脸识别系统适应新现实的问题，在新现实中，人们可能会戴上外科口罩来遮盖自己的鼻子和嘴巴。用于训练这些系统的传统数据集（如CelebA、CASIA WebFace）在大流行之前就发布了，因此，由于缺少佩戴口罩的人的例子，这些数据集现在似乎不合适。我们提出了一种通过创建合成遮罩并将其覆盖在原始图像中的人脸上来增强包含无遮罩人脸的数据集的方法。我们的方法依赖于Spark AR Studio，这是一个由Facebook开发的用于创建Instagram人脸过滤器的开发程序。在我们的方法中，我们使用9个不同颜色、形状和面料的面具。我们使用我们的方法为CASIA WebFace数据集生成了445446（90%）个掩模样本，为CelebA数据集生成了196254（96.8%）个掩模样本，并在https://github.com/securifai/masked_faces. 我们通过让志愿者定性地将我们的方法与为同一任务设计的其他方法或数据集进行比较，展示了我们的方法产生了覆盖在脸上的面具的更加真实的训练示例。我们还通过评估在增强数据集上训练的最先进的人脸识别系统（FaceNet、VGG face、ArcFace）证明了我们方法的有效性，并表明当测试基准包含蒙面时，它们优于在原始数据集上训练的等效系统（包含没有蒙面的人脸）。摘要：The COVID-19 pandemic raises the problem of adapting face recognition systems to the new reality, where people may wear surgical masks to cover their noses and mouths. Traditional data sets (e.g., CelebA, CASIA-WebFace) used for training these systems were released before the pandemic, so they now seem unsuited due to the lack of examples of people wearing masks. We propose a method for enhancing data sets containing faces without masks by creating synthetic masks and overlaying them on faces in the original images. Our method relies on Spark AR Studio, a developer program made by Facebook that is used to create Instagram face filters. In our approach, we use 9 masks of different colors, shapes and fabrics. We employ our method to generate a number of 445,446 (90%) samples of masks for the CASIA-WebFace data set and 196,254 (96.8%) masks for the CelebA data set, releasing the mask images at https://github.com/securifai/masked_faces. We show that our method produces significantly more realistic training examples of masks overlaid on faces by asking volunteers to qualitatively compare it to other methods or data sets designed for the same task. We also demonstrate the usefulness of our method by evaluating state-of-the-art face recognition systems (FaceNet, VGG-face, ArcFace) trained on the enhanced data sets and showing that they outperform equivalent systems trained on the original data sets (containing faces without masks), when the test benchmark contains masked faces.

【7】 Revisiting 3D ResNets for Video Recognition 标题：视频识别中的3D ResNet再探链接：https://arxiv.org/abs/2109.01696

作者：Xianzhi Du,Yeqing Li,Yin Cui,Rui Qian,Jing Li,Irwan Bello 机构：Google Research 备注：6 pages 摘要：Bello最近的一项研究表明，对于视觉识别，训练和缩放策略可能比模型架构更重要。本文研究视频识别模型的有效训练和缩放策略。我们提出了一个简单的三维resnet缩放策略，结合改进的训练策略和较小的架构更改。结果模型称为3D ResNet RS，在未进行预训练的情况下，在Kinetics-400和Kinetics-600上分别达到81.0和83.8的竞争性能。当在大型网络视频文本数据集上进行预训练时，我们的最佳模型在Kinetics-400和Kinetics-600上分别达到83.5和84.3。在使用对比学习的自监督设置中进一步评估了建议的缩放规则，证明了改进的性能。代码可从以下网址获取：https://github.com/tensorflow/models/tree/master/official. 摘要：A recent work from Bello shows that training and scaling strategies may be more significant than model architectures for visual recognition. This short note studies effective training and scaling strategies for video recognition models. We propose a simple scaling strategy for 3D ResNets, in combination with improved training strategies and minor architectural changes. The resulting models, termed 3D ResNet-RS, attain competitive performance of 81.0 on Kinetics-400 and 83.8 on Kinetics-600 without pre-training. When pre-trained on a large Web Video Text dataset, our best model achieves 83.5 and 84.3 on Kinetics-400 and Kinetics-600. The proposed scaling rule is further evaluated in a self-supervised setup using contrastive learning, demonstrating improved performance. Code is available at: https://github.com/tensorflow/models/tree/master/official.

表征(4篇)

【1】 Learning Interpretable Representations of Entanglement in Quantum Optics Experiments using Deep Generative Models 标题：利用深度生成模型学习量子光学实验中纠缠的可解释表示链接：https://arxiv.org/abs/2109.02490

作者：Daniel Flam-Shepherd,Tony Wu,Xuemei Gu,Alba Cervera-Lierta,Mario Krenn,Alan Aspuru-Guzik 机构：Department of Computer Science, University of Toronto, Toronto, Ontario M,S ,E, Canada, Vector Institute for Artificial Intelligence, Toronto, Ontario M,S ,M, Canada, Hefei National Laboratory for physical Science at Microscale and Department of Modern 摘要：量子物理实验产生了一些有趣的现象，如干涉或纠缠，这是许多未来量子技术的核心特性。量子实验的结构与其纠缠特性之间的复杂关系对于量子光学的基础研究至关重要，但很难直观地理解。我们提出了量子光学实验的第一个深层生成模型，其中变分自动编码器（QOVAE）在实验装置的数据集上进行训练。在一系列的计算实验中，我们研究了QOVAE的学习表示及其对量子光学世界的内部理解。我们证明了QOVAE学习了量子光学实验的无敌表示，以及实验结构和纠缠之间的关系。我们证明了QOVAE能够为高度纠缠的量子态生成新的实验，这些量子态具有与其训练数据相匹配的特定分布。重要的是，我们能够完全解释QOVAE如何构造它的潜在空间，找到我们完全可以用量子物理学来解释的奇怪模式。结果表明，我们如何能够成功地使用和理解复杂科学领域中深层生成模型的内部表示。QOVAE和我们调查中的见解可以立即应用于基础科学研究中的其他物理系统。摘要：Quantum physics experiments produce interesting phenomena such as interference or entanglement, which is a core property of numerous future quantum technologies. The complex relationship between a quantum experiment's structure and its entanglement properties is essential to fundamental research in quantum optics but is difficult to intuitively understand. We present the first deep generative model of quantum optics experiments where a variational autoencoder (QOVAE) is trained on a dataset of experimental setups. In a series of computational experiments, we investigate the learned representation of the QOVAE and its internal understanding of the quantum optics world. We demonstrate that the QOVAE learns an intrepretable representation of quantum optics experiments and the relationship between experiment structure and entanglement. We show the QOVAE is able to generate novel experiments for highly entangled quantum states with specific distributions that match its training data. Importantly, we are able to fully interpret how the QOVAE structures its latent space, finding curious patterns that we can entirely explain in terms of quantum physics. The results demonstrate how we can successfully use and understand the internal representations of deep generative models in a complex scientific domain. The QOVAE and the insights from our investigations can be immediately applied to other physical systems throughout fundamental scientific research.

【2】 Learning with Holographic Reduced Representations 标题：全息约简表示的学习链接：https://arxiv.org/abs/2109.02157

作者：Ashwinkumar Ganesan,Hang Gao,Sunil Gandhi,Edward Raff,Tim Oates,James Holt,Mark McLean 机构：University of Maryland, Baltimore County, Laboratory for Physical Sciences, Booz Allen Hamilton 摘要：全息简化表示法（HRR）是一种在实值向量上执行符号人工智能的方法，通过将每个向量与抽象概念相关联，并提供数学运算来操纵向量，就像它们是经典符号对象一样。除了旧的符号人工智能工作和认知科学外，这种方法几乎没有什么用处。我们的目标是重新审视这种方法，以了解将混合神经-符号学习方法作为深度学习体系结构的可微组件是否可行。由于数值不稳定，HRR在可微解中无效。我们通过引入一个投影步骤来解决这个问题，该投影步骤迫使向量存在于空间中一个行为良好的点中。通过这样做，我们将HRRs的概念检索效率提高了100多倍。使用多标签分类，我们展示了如何利用符号HRR特性来开发能够有效学习的输出层和损失函数，并允许我们研究HRR神经符号学习方法的一些优缺点。摘要：Holographic Reduced Representations (HRR) are a method for performing symbolic AI on top of real-valued vectors \cite{Plate1995} by associating each vector with an abstract concept, and providing mathematical operations to manipulate vectors as if they were classic symbolic objects. This method has seen little use outside of older symbolic AI work and cognitive science. Our goal is to revisit this approach to understand if it is viable for enabling a hybrid neural-symbolic approach to learning as a differentiable component of a deep learning architecture. HRRs today are not effective in a differentiable solution due to numerical instability, a problem we solve by introducing a projection step that forces the vectors to exist in a well behaved point in space. In doing so we improve the concept retrieval efficacy of HRRs by over $100\times$. Using multi-label classification we demonstrate how to leverage the symbolic HRR properties to develop an output layer and loss function that is able to learn effectively, and allows us to investigate some of the pros and cons of an HRR neuro-symbolic learning approach.

【3】 On robustness of generative representations against catastrophic forgetting 标题：论生成表征对灾难性遗忘的稳健性链接：https://arxiv.org/abs/2109.01844

作者：Wojciech Masarczyk,Kamil Deja,Tomasz Trzciński 机构：Tomasz Trzcinski,[,−,−,−,], Warsaw University of Technology, Jagiellonian University, Tooploox 摘要：在学习新任务的同时灾难性地遗忘先前学习的知识是当代神经网络的一个广泛观察到的局限性。尽管有许多持续学习方法被提出来缓解这一缺点，但主要问题仍然没有得到回答：灾难性遗忘的根本原因是什么？在这项工作中，我们的目标是通过提出和验证一组与神经模型内部构建的表征的特异性相关的研究假设来回答这个问题。更具体地说，我们设计了一组经验评估，比较了判别模型和生成模型中表征对灾难性遗忘的鲁棒性。我们观察到，区别性模型学习的表征比生成性模型更容易发生灾难性遗忘，这为开发用于持续学习的生成性模型带来了新的优势。最后，我们的工作开辟了新的研究途径和可能性，在持续学习中采用生成模型，而不仅仅是重放机制。摘要：Catastrophic forgetting of previously learned knowledge while learning new tasks is a widely observed limitation of contemporary neural networks. Although many continual learning methods are proposed to mitigate this drawback, the main question remains unanswered: what is the root cause of catastrophic forgetting? In this work, we aim at answering this question by posing and validating a set of research hypotheses related to the specificity of representations built internally by neural models. More specifically, we design a set of empirical evaluations that compare the robustness of representations in discriminative and generative models against catastrophic forgetting. We observe that representations learned by discriminative models are more prone to catastrophic forgetting than their generative counterparts, which sheds new light on the advantages of developing generative models for continual learning. Finally, our work opens new research pathways and possibilities to adopt generative models in continual learning beyond mere replay mechanisms.

【4】 Data-Driven Learning of 3-Point Correlation Functions as Microstructure Representations 标题：三点相关函数作为微结构表示的数据驱动学习链接：https://arxiv.org/abs/2109.02255

作者：Sheng Cheng,Yang Jiao,Yi Ren 机构：Computer Science, Arizona State University, Tempe AZ , United States, Materials Science and Engineering, Arizona State University, Tempe AZ , United, Aerospace and Mechanical Engineering, Arizona State University, Tempe AZ 备注：submitted to Acta Materialia 摘要：本文考虑了确定无序非均质材料系统的完整、简洁和可解释的定量微观结构表示的公开挑战。通过现有的数据驱动方法，例如深层生成模型，已经实现了完整性和简洁性，但是，深层生成模型不能提供数学上可解释的潜在表示。本研究探讨由三点相关函数组成的表示，这是一种特殊类型的空间卷积。我们证明了各种微观结构可以用一个简洁的三点关联子集来表征，并且可以通过贝叶斯优化来识别这些子集。最后，我们证明了所提出的表示可以直接用于基于有效介质理论的材料性质计算。摘要：This paper considers the open challenge of identifying complete, concise, and explainable quantitative microstructure representations for disordered heterogeneous material systems. Completeness and conciseness have been achieved through existing data-driven methods, e.g., deep generative models, which, however, do not provide mathematically explainable latent representations. This study investigates representations composed of three-point correlation functions, which are a special type of spatial convolutions. We show that a variety of microstructures can be characterized by a concise subset of three-point correlations, and the identification of such subsets can be achieved by Bayesian optimization. Lastly, we show that the proposed representation can directly be used to compute material properties based on the effective medium theory.

3D|3D重建等相关(1篇)

【1】 Estimating permeability of 3D micro-CT images by physics-informed CNNs based on DNS 标题：基于DNS的物理信息CNNs估计三维Micro-CT图像的渗透率链接：https://arxiv.org/abs/2109.01818

作者：Stephan Gärttner,Faruk O. Alpak,Andreas Meier,Nadja Ray,Florian Frank 机构：Friedrich-Alexander-Universität Erlangen-Nürnberg, Department Mathematik, Cauerstraße , Erlangen, Germany, Shell Technology Center, Highway , South, Houston, TX , USA 摘要：近年来，卷积神经网络（CNN）因其在多孔介质研究和应用中快速逼近有效流体力学参数的能力而受到越来越多的关注。本文提出了一种从地质岩石样品的显微CT扫描预测渗透率的新方法。用于渗透率预测的CNN的训练数据集由渗透率标签组成，渗透率标签通常由经典的格子玻尔兹曼方法（LBM）生成，该方法模拟通过分割图像数据的孔隙空间的流动。相反，我们通过高效的分布式并行方式求解稳态斯托克斯方程来执行直接数值模拟（DNS）。因此，我们避免了在复杂孔隙几何结构上经常观察到的LBM收敛问题，从而提高了训练数据集的通用性和准确性。使用DNS计算的渗透率，通过另外提供定制的孔隙空间特征量来训练物理模型（CNN）。更准确地说，通过在孔隙空间的图形表示上利用与流动问题的联系，可以根据最大流量值向网络提供有关受限结构的附加信息，这是我们工作流程的关键创新组件。因此，对于原型岩层中的各种砂岩样品，可以观察到前所未有的预测精度和鲁棒性。摘要：In recent years, convolutional neural networks (CNNs) have experienced an increasing interest for their ability to perform fast approximation of effective hydrodynamic parameters in porous media research and applications. This paper presents a novel methodology for permeability prediction from micro-CT scans of geological rock samples. The training data set for CNNs dedicated to permeability prediction consists of permeability labels that are typically generated by classical lattice Boltzmann methods (LBM) that simulate the flow through the pore space of the segmented image data. We instead perform direct numerical simulation (DNS) by solving the stationary Stokes equation in an efficient and distributed-parallel manner. As such, we circumvent the convergence issues of LBM that frequently are observed on complex pore geometries, and therefore, improve on the generality and accuracy of our training data set. Using the DNS-computed permeabilities, a physics-informed CNN PhyCNN) is trained by additionally providing a tailored characteristic quantity of the pore space. More precisely, by exploiting the connection to flow problems on a graph representation of the pore space, additional information about confined structures is provided to the network in terms of the maximum flow value, which is the key innovative component of our workflow. As a result, unprecedented prediction accuracy and robustness are observed for a variety of sandstone samples from archetypal rock formations.

编码器(1篇)

【1】 PermuteFormer: Efficient Relative Position Encoding for Long Sequences 标题：PermuteFormer：一种高效的长序列相对位置编码链接：https://arxiv.org/abs/2109.02377

作者：Peng Chen 机构：Peking University 备注：Accepted by EMNLP 2021 摘要：《Transformer，表演者》最近的一个变体，通过线性注意机制将Transformer扩展到更长的序列。然而，它与相对位置编码不兼容，相对位置编码比绝对位置编码具有优势。在本文中，我们讨论了向执行者添加相对位置编码的可能方法。在此基础上，我们提出了PermuteFormer，这是一种基于执行者的模型，具有相对位置编码，在长序列上线性扩展。PermuteFormer对查询和键应用位置相关转换，将位置信息编码到注意模块中。这种转换是精心设计的，这样自我注意的最终输出就不会受到标记绝对位置的影响。PermuteFormer引入了可忽略不计的计算开销，其设计使其运行速度与执行者一样快。我们在长序列数据集Long-Range Arena和语言建模数据集WikiText-103上评估PermuteFormer。实验表明，PermuteFormer在几乎没有计算开销的情况下均匀地提高了执行者的性能，并且在大多数任务上都优于vanilla Transformer。摘要：A recent variation of Transformer, Performer, scales Transformer to longer sequences with a linear attention mechanism. However, it is not compatible with relative position encoding, which has advantages over absolute position encoding. In this paper, we discuss possible ways to add relative position encoding to Performer. Based on the analysis, we propose PermuteFormer, a Performer-based model with relative position encoding that scales linearly on long sequences. PermuteFormer applies position-dependent transformation on queries and keys to encode positional information into the attention module. This transformation is carefully crafted so that the final output of self-attention is not affected by absolute positions of tokens. PermuteFormer introduces negligible computational overhead by design that it runs as fast as Performer. We evaluate PermuteFormer on Long-Range Arena, a dataset for long sequences, as well as WikiText-103, a language modeling dataset. The experiments show that PermuteFormer uniformly improves the performance of Performer with almost no computational overhead and outperforms vanilla Transformer on most of the tasks.

优化|敛散性(3篇)

【1】 Application of Monte Carlo Stochastic Optimization (MOST) to Deep Learning 标题：蒙特卡罗随机优化(MOST)在深度学习中的应用链接：https://arxiv.org/abs/2109.02441

作者：Sin-ichi Inage,Hana Hebishima 机构：Fluid Engineering Laboratory, Department of Mechanical Engineering, Fukuoka University, Fukuoka , Japan, Corresponding author: Prof. Shin-ichi Inage, -,-, Nanakuma, Jonan-ku, Fukuoka ,-, Japan, Tel: +,(Ext. ,), Fax: + 备注：13 pages, 8 figures 摘要：在本文中，我们将作者提出的蒙特卡罗随机优化（MOST）应用于异或门的深入学习，并验证其有效性。在当今高度发达的信息社会中，基于神经网络的深度机器学习是推动创新的最重要关键词之一。因此，对大规模、高速、高精度系统的研究十分活跃。为了有效地搜索目标函数的最优值，作者将构成目标函数的多变量参数的搜索区域按每个参数划分为两个区域，用蒙特卡罗方法数值计算两个区域的积分，比较积分值的大小，并判断在一个小区域内存在一个最佳点。在前一篇文章中，我们研究了优化方法中的基准问题。将该方法应用于异或门的神经网络，并与Adam和遗传算法的权因子优化结果进行了比较。结果表明，该方法比现有方法收敛速度快。摘要：In this paper, we apply the Monte Carlo stochastic optimization (MOST) proposed by the authors to a deep learning of XOR gate and verify its effectiveness. Deep machine learning based on neural networks is one of the most important keywords driving innovation in today's highly advanced information society. Therefore, there has been active research on large-scale, high-speed, and high-precision systems. For the purpose of efficiently searching the optimum value of the objective function, the author divides the search region of a multivariable parameter constituting the objective function into two by each parameter, numerically finds the integration of the two regions by the Monte Carlo method, compares the magnitude of the integration value, and judges that there is an optimum point in a small region. In the previous paper, we examined the problem of the benchmark in the optimization method. This method is applied to neural networks of XOR gate, and compared with the results of weight factor optimization by Adam and genetic algorithm. As a result, it was confirmed that it converged faster than the existing method.

【2】 Optimal Prediction of Unmeasured Output from Measurable Outputs In LTI Systems 标题：LTI系统中不可测输出与可测输出的最优预测链接：https://arxiv.org/abs/2109.02384

作者：Deividas Eringis,John Leth,Zheng-Hua Tan,Rafal Wisniewski,Mihaly Petreczky 机构：University, Denmark, Automatique de Lille (CRIStAL) Lille, France, Notation and terminology Let F denote a σ-algebra on the set Ω and, P be a probability measure on F., Unless otherwise stated all probabilistic 摘要：在这篇短文中，我们展示了一个最佳预测器的推导，当系统输出的一部分未被测量，但能够从被测量的系统输出的其余部分进行预测时。据作者所知，类似的推导以前已经做过，但在状态空间表示中没有。摘要：In this short article, we showcase the derivation of an optimal predictor, when one part of system's output is not measured but is able to be predicted from the rest of the system's output which is measured. According to author's knowledge, similar derivations have been done before but not in state-space representation.

【3】 On Faster Convergence of Scaled Sign Gradient Descent 标题：关于标度符号梯度下降法的快速收敛链接：https://arxiv.org/abs/2109.01806

作者：Xiuxian Li,Kuo-Yi Lin,Li Li,Yiguang Hong,Jie Chen 机构：The authors are with Department of Control Science and Engineer-ing, Shanghai Re-search Institute for Intelligent Autonomous Systems, Tongji University 摘要：在大规模网络上的工业应用中，通信一直被视为一个重要的瓶颈。为了减轻通信负担，基于符号的优化算法最近在工业界和学术界都得到了普及，这与自适应梯度方法（如Adam）密切相关。沿着这条路线，本文研究了一种基于符号的梯度下降变体，称为缩放符号GD，在三种情况下的更快收敛性：1）目标函数是强凸的；2）目标函数是非凸的，但满足Polyak-Lojasiewicz（PL）不等式；3）梯度是随机的，在这种情况下称为缩放符号。对于前两种情况，可以证明缩放SINGGD以线性速率收敛。对于情况3），当采用恒定学习速率时，该算法线性收敛到最优值的邻域，当使用递减学习速率时，该算法以$O（1/k）$的速率收敛，其中$k$是迭代次数。结果还扩展到参数服务器框架中的分布式多数投票设置。最后，对logistic回归进行了数值实验，验证了理论结果。摘要：Communication has been seen as a significant bottleneck in industrial applications over large-scale networks. To alleviate the communication burden, sign-based optimization algorithms have gained popularity recently in both industrial and academic communities, which is shown to be closely related to adaptive gradient methods, such as Adam. Along this line, this paper investigates faster convergence for a variant of sign-based gradient descent, called scaled signGD, in three cases: 1) the objective function is strongly convex; 2) the objective function is nonconvex but satisfies the Polyak-Lojasiewicz (PL) inequality; 3) the gradient is stochastic, called scaled signGD in this case. For the first two cases, it can be shown that the scaled signGD converges at a linear rate. For case 3), the algorithm is shown to converge linearly to a neighborhood of the optimal value when a constant learning rate is employed, and the algorithm converges at a rate of $O(1/k)$ when using a diminishing learning rate, where $k$ is the iteration number. The results are also extended to the distributed setting by majority vote in a parameter-server framework. Finally, numerical experiments on logistic regression are performed to corroborate the theoretical findings.

预测|估计(5篇)

【1】 Scikit-dimension: a Python package for intrinsic dimension estimation 标题：SCHKIT-DIMENSION：一个用于固有维数估计的Python包链接：https://arxiv.org/abs/2109.02596

作者：Jonathan Bac,Evgeny M. Mirkes,Alexander N. Gorban,Ivan Tyukin,Andrei Zinovyev 机构：Institut Curie, PSL Research University, Paris, France, INSERM, U, Paris,France, CBIO-Centre for Computational Biology, Mines ParisTech, PSL Research University, Paris, France, Department of Mathematics, University of Leicester, Leicester, UK 备注：12 pages, 4 figures, 1 table 摘要：在将机器学习应用于实际数据时，处理不确定性主要依赖于内在维度（ID）的知识。为了估计ID，已经提出了许多方法，但是在Python中还没有实现一个标准包来轻松地逐个或一次性应用它们。本技术说明介绍\texttt{scikit dimension}，这是一个用于内部维度估计的开放源码Python包\texttt{scikit dimension}包提供了基于scikit学习应用程序编程接口的大多数已知ID估计器的统一实现，以评估全局和局部内在维度，以及文献中广泛存在的合成玩具和基准数据集的生成器。该包是使用评估代码质量、覆盖率、单元测试和持续集成的工具开发的。我们简要描述了该软件包，并演示了它在大规模（500多个数据集）基准测试中的应用，该基准测试用于真实和合成数据中的ID估计。源代码可从https://github.com/j-bac/scikit-dimension ，文档可从https://scikit-dimension.readthedocs.io . 摘要：Dealing with uncertainty in applications of machine learning to real-life data critically depends on the knowledge of intrinsic dimensionality (ID). A number of methods have been suggested for the purpose of estimating ID, but no standard package to easily apply them one by one or all at once has been implemented in Python. This technical note introduces \texttt{scikit-dimension}, an open-source Python package for intrinsic dimension estimation. \texttt{scikit-dimension} package provides a uniform implementation of most of the known ID estimators based on scikit-learn application programming interface to evaluate global and local intrinsic dimension, as well as generators of synthetic toy and benchmark datasets widespread in the literature. The package is developed with tools assessing the code quality, coverage, unit testing and continuous integration. We briefly describe the package and demonstrate its use in a large-scale (more than 500 datasets) benchmarking of methods for ID estimation in real-life and synthetic data. The source code is available from https://github.com/j-bac/scikit-dimension , the documentation is available from https://scikit-dimension.readthedocs.io .

【2】 Global-Local Item Embedding for Temporal Set Prediction 标题：时态集预测的全局-局部项嵌入链接：https://arxiv.org/abs/2109.02074

作者：Seungjae Jung,Young-Jin Park,Jisu Jeong,Kyung-Min Kim,Hiun Kim,Minkyu Kim,Hanock Kwak 备注：8 pages, 3 figures. To appear in RecSys 2021 LBR 摘要：时间集预测变得越来越重要，因为许多公司在其在线业务中使用推荐系统，例如购物篮的个性化购买预测。虽然以前的大多数技术都专注于利用用户的历史记录，但将其与其他人的历史记录相结合的研究仍然是未开发的潜力。本文提出了全局-局部项嵌入（GLOIE），它通过将集合的名称作为全局和局部信息来区分两种时态模式，从而学会在整个用户以及用户内部利用集合的时态属性。GLOIE使用变分自动编码器（VAE）和基于动态图的模型来捕获全局和局部信息，然后将注意力应用于集成结果项嵌入。此外，我们建议将Tweedie输出用于VAE的解码器，因为它可以很容易地建模零膨胀和长尾分布，这比高斯分布或多项式分布更适合于几种真实世界的数据分布。当在三个公共基准上进行评估时，我们的算法在大多数排名指标上始终优于以前最先进的方法。摘要：Temporal set prediction is becoming increasingly important as many companies employ recommender systems in their online businesses, e.g., personalized purchase prediction of shopping baskets. While most previous techniques have focused on leveraging a user's history, the study of combining it with others' histories remains untapped potential. This paper proposes Global-Local Item Embedding (GLOIE) that learns to utilize the temporal properties of sets across whole users as well as within a user by coining the names as global and local information to distinguish the two temporal patterns. GLOIE uses Variational Autoencoder (VAE) and dynamic graph-based model to capture global and local information and then applies attention to integrate resulting item embeddings. Additionally, we propose to use Tweedie output for the decoder of VAE as it can easily model zero-inflated and long-tailed distribution, which is more suitable for several real-world data distributions than Gaussian or multinomial counterparts. When evaluated on three public benchmarks, our algorithm consistently outperforms previous state-of-the-art methods in most ranking metrics.

【3】 An empirical evaluation of attention-based multi-head models for improved turbofan engine remaining useful life prediction 标题：基于注意力的多头模型在改进涡扇发动机剩余寿命预测中的实证评价链接：https://arxiv.org/abs/2109.01761

作者：Abiodun Ayodeji,Wenhai Wang,Jianzhong Su,Jianquan Yuan,Xinggao Liu 机构：State Key Laboratory of Industrial Control Technology, Control Department, Zhejiang University, Hangzhou , P. R. China, Tianjin Jinhang Technical Physics Institute, Tianjin , P. R. China 备注：32 pages, 13 figures, 8 tables 摘要：在基于多变量时间序列信号的深度学习体系结构中，单单元（head）是传统的输入特征抽取器。单头网络生成的固定维向量表示在工业机械状态监测和预测性维护中的重要性已得到证明。然而，用单个磁头处理异质传感器信号可能导致模型无法明确解释时变多变量输入的多样性。这项工作将传统的单头深度学习模型扩展到一种更为稳健的形式，通过开发特定于上下文的头来独立捕获多元时间序列信号中每个传感器读数的固有模式。使用涡扇飞机发动机基准数据集（CMAPSS），进行了广泛的实验，以验证多头全连接神经元、递归网络、卷积网络、Transformer式独立注意网络及其变体在剩余使用寿命估计中的有效性和益处。此外，还评估了不同注意机制对多头模型的影响。此外，还分析了每种体系结构的相对优势和计算开销。结果表明，利用注意层是任务敏感和模型依赖的，因为它不能在所研究的模型中提供一致的改进。将结果与五种最先进的模型进行了进一步比较，比较结果表明，相对简单的多头结构的性能优于最先进的模型。本研究的结果表明，多头模型和注意机制对于提高对工业资产剩余使用寿命的理解具有重要意义。摘要：A single unit (head) is the conventional input feature extractor in deep learning architectures trained on multivariate time series signals. The importance of the fixed-dimensional vector representation generated by the single-head network has been demonstrated for industrial machinery condition monitoring and predictive maintenance. However, processing heterogeneous sensor signals with a single head may result in a model that cannot explicitly account for the diversity in time-varying multivariate inputs. This work extends the conventional single-head deep learning models to a more robust form by developing context-specific heads to independently capture the inherent pattern of each sensor reading in multivariate time series signals. Using the turbofan aircraft engine benchmark dataset (CMAPSS), an extensive experiment is performed to verify the effectiveness and benefits of multi-head fully connected neurons, recurrent networks, convolution network, the transformer-style stand-alone attention network, and their variants for remaining useful life estimation. Moreover, the effect of different attention mechanisms on the multi-head models is also evaluated. In addition, each architecture's relative advantage and computational overhead are analyzed. Results show that utilizing the attention layer is task-sensitive and model-dependent, as it does not provide consistent improvement across the models investigated. The result is further compared with five state-of-the-art models, and the comparison shows that a relatively simple multi-head architecture performs better than the state-of-the-art models. The results presented in this study demonstrate the importance of multi-head models and attention mechanisms to improved understanding of the remaining useful life of industrial assets.

【4】 A Multi-view Multi-task Learning Framework for Multi-variate Time Series Forecasting 标题：多变量时间序列预测的多视点多任务学习框架链接：https://arxiv.org/abs/2109.01657

作者：Jinliang Deng,Xiusi Chen,Renhe Jiang,Xuan Song,Ivor W. Tsang 备注：14 pages 摘要：多变量时间序列（MTS）数据是现实世界中普遍存在的一类数据抽象。MTS的任何实例都是由一个混合动力系统生成的，它们的特定动力通常是未知的。这种动力系统的混合性质是复杂的外部属性的结果，例如地理位置和时间，每种属性都可以分为空间属性或时间属性。因此，有两种基本视图可用于分析MTS数据，即空间视图和时间视图。此外，从这两个视图中的每一个视图，我们可以根据MTS的相关属性值将MTS的数据样本集划分为不相交的预测任务。然后，同一任务的样本将显示类似的即将出现的模式，与原始的单视图设置相比，该模式不太复杂，无法预测。考虑到这一点，我们提出了一种新的多视图多任务（MVMT）学习框架，用于MTS预测。MVMT信息没有在大多数场景中显式显示，而是深深隐藏在MTS数据中，这严重阻碍了模型自然地捕获它。为此，我们分别开发了两种基本操作，即任务仿射变换和任务规范化。将这两种操作与空间和时间视图上的先验知识相结合，可以使模型在预测的同时自适应地提取MVMT信息。在三个数据集上进行的大量实验表明，MVMT学习框架可以大大增强规范体系结构的有效性和效率。此外，我们设计了丰富的案例研究，以揭示在整个预测过程的不同阶段产生的表示的特性。摘要：Multi-variate time series (MTS) data is a ubiquitous class of data abstraction in the real world. Any instance of MTS is generated from a hybrid dynamical system and their specific dynamics are usually unknown. The hybrid nature of such a dynamical system is a result of complex external attributes, such as geographic location and time of day, each of which can be categorized into either spatial attributes or temporal attributes. Therefore, there are two fundamental views which can be used to analyze MTS data, namely the spatial view and the temporal view. Moreover, from each of these two views, we can partition the set of data samples of MTS into disjoint forecasting tasks in accordance with their associated attribute values. Then, samples of the same task will manifest similar forthcoming pattern, which is less sophisticated to be predicted in comparison with the original single-view setting. Considering this insight, we propose a novel multi-view multi-task (MVMT) learning framework for MTS forecasting. Instead of being explicitly presented in most scenarios, MVMT information is deeply concealed in the MTS data, which severely hinders the model from capturing it naturally. To this end, we develop two kinds of basic operations, namely task-wise affine transformation and task-wise normalization, respectively. Applying these two operations with prior knowledge on the spatial and temporal view allows the model to adaptively extract MVMT information while predicting. Extensive experiments on three datasets are conducted to illustrate that canonical architectures can be greatly enhanced by the MVMT learning framework in terms of both effectiveness and efficiency. In addition, we design rich case studies to reveal the properties of representations produced at different phases in the entire prediction procedure.

【5】 Estimation of Bivariate Structural Causal Models by Variational Gaussian Process Regression Under Likelihoods Parametrised by Normalising Flows 标题：归一化流参数化似然概率下二元结构因果模型的变分高斯过程回归估计链接：https://arxiv.org/abs/2109.02521

作者：Nico Reick,Felix Wiewel,Alexander Bartler,Bin Yang 摘要：最先进的人工智能的一个主要缺点是缺乏可解释性。解决这个问题的一种方法是考虑因果关系。因果机制可以用结构因果模型来描述。在这项工作中，我们提出了一种估计二元结构因果模型的方法，将归一化流应用于后非线性模型的密度估计和变分高斯过程回归。它通过因果和残差的独立性或似然比检验，促进了因果发现，即区分因果。与简单的加性噪声模型相比，我们估计后非线性模型的方法能够更好地解释各种真实世界的因果对。尽管很难利用T“ubingen基准数据库中所有对的这种优势，但我们证明，将加性噪声模型方法与我们的方法相结合可以显著增强因果发现。摘要：One major drawback of state-of-the-art artificial intelligence is its lack of explainability. One approach to solve the problem is taking causality into account. Causal mechanisms can be described by structural causal models. In this work, we propose a method for estimating bivariate structural causal models using a combination of normalising flows applied to density estimation and variational Gaussian process regression for post-nonlinear models. It facilitates causal discovery, i.e. distinguishing cause and effect, by either the independence of cause and residual or a likelihood ratio test. Our method which estimates post-nonlinear models can better explain a variety of real-world cause-effect pairs than a simple additive noise model. Though it remains difficult to exploit this benefit regarding all pairs from the T\"ubingen benchmark database, we demonstrate that combining the additive noise model approach with our method significantly enhances causal discovery.

其他神经网络|深度学习|模型|建模(16篇)

【1】 EsmamDS: A more diverse exceptional survival model mining approach 标题：EsmamDS：一种更加多样化的例外生存模型挖掘方法链接：https://arxiv.org/abs/2109.02610

作者：Juliana Barcellos Mattos,Paulo S. G. de Mattos Neto,Renato Vimieiro 机构：Centro de Inform´atica (CIn), Universidade Federal de Pernambuco, Recife, Brazil, Depto. de Ciˆencia da Computac¸˜ao (DCC), Universidade Federal de Minas Gerais, Belo Horizonte, Brazil 摘要：文献中的各种作品都试图揭示与生存行为相关的因素。然而，提供此类信息的计算工具是用于预测（生存）事件是否发生或何时发生的全局模型。在处理解释生存行为差异的问题时，这些方法依赖于预测特征（假设）和风险分层。换句话说，他们缺乏发现与生存相关因素的新信息的能力。相反，我们从描述性监督模式挖掘的角度来处理这个问题，以发现与不同生存行为相关的局部模式。因此，我们引入了EsmamDS算法：一个特殊的模型挖掘框架，以提供呈现不寻常生存模型的子组的直接特征——由Kaplan-Meier估计给出。这项工作建立在ESAM算法的基础上，以解决模式冗余问题，并为生存行为提供更多信息和多样性特征。摘要：A variety of works in the literature strive to uncover the factors associated with survival behaviour. However, the computational tools to provide such information are global models designed to predict if or when a (survival) event will occur. When approaching the problem of explaining differences in survival behaviour, those approaches rely on (assumptions of) predictive features followed by risk stratification. In other words, they lack the ability to discover new information on factors related to survival. In contrast, we approach such a problem from the perspective of descriptive supervised pattern mining to discover local patterns associated with different survival behaviours. Hence, we introduce the EsmamDS algorithm: an Exceptional Model Mining framework to provide straightforward characterisations of subgroups presenting unusual survival models -- given by the Kaplan-Meier estimates. This work builds on the Esmam algorithm to address the problem of pattern redundancy and provide a more informative and diverse characterisation of survival behaviour.

【2】 Statistical Privacy Guarantees of Machine Learning Preprocessing Techniques 标题：机器学习预处理技术的统计保密性保证链接：https://arxiv.org/abs/2109.02496

作者：Ashly Lau,Jonathan Passerat-Palmbach 备注：Accepted to the ICML 2021 Theory and Practice of Differential Privacy Workshop 摘要：差异隐私为机器学习应用程序提供了强大的隐私保障。最近的许多工作都集中在开发不同的私有模型上，但是在机器学习管道的其他阶段，特别是在预处理阶段，还存在差距。我们的贡献有两个方面：我们采用了基于统计方法的隐私侵犯检测框架，以经验方式测量机器学习管道的隐私级别，并应用新创建的框架表明，在处理不平衡数据集时使用的重采样技术会导致生成的模型泄漏更多隐私。这些结果强调了开发私有预处理技术的必要性。摘要：Differential privacy provides strong privacy guarantees for machine learning applications. Much recent work has been focused on developing differentially private models, however there has been a gap in other stages of the machine learning pipeline, in particular during the preprocessing phase. Our contributions are twofold: we adapt a privacy violation detection framework based on statistical methods to empirically measure privacy levels of machine learning pipelines, and apply the newly created framework to show that resampling techniques used when dealing with imbalanced datasets cause the resultant model to leak more privacy. These results highlight the need for developing private preprocessing techniques.

【3】 Data-Driven Wind Turbine Wake Modeling via Probabilistic Machine Learning 标题：基于概率机器学习的数据驱动风力机尾迹建模链接：https://arxiv.org/abs/2109.02411

作者：S. Ashwin Renganathan,Romit Maulik,Stefano Letizia,Giacomo Valerio Iungo 机构：Mathematics & Computer Science, Department of Mechanical Engineering, The University of Texas at Dallas, Dallas, TX 备注：18 pages, 10 figures 摘要：风电场设计主要取决于风力涡轮机尾流随大气风条件的变化以及尾流之间的相互作用。基于物理的模型以高保真度捕捉尾流场，在计算上执行风电场布局优化非常昂贵，因此，数据驱动的降阶模型可以代表模拟风电场的有效替代方案。在这项工作中，我们使用风力涡轮机尾迹的真实世界光探测和测距（LiDAR）测量来构建使用机器学习的预测代理模型。具体地说，我们首先演示了如何使用深度自动编码器来寻找低维\emph{潜伏}空间，该空间给出了尾迹激光雷达测量的计算可处理近似值。然后，我们使用深度神经网络学习参数空间和（潜在空间）尾迹流场之间的映射。此外，我们还演示了使用概率机器学习技术，即高斯过程建模，来学习参数空间-潜在空间映射以及数据中的认知和任意不确定性。最后，为了处理大型数据集的训练，我们演示了变分高斯过程模型的使用，它为大型数据集的传统高斯过程模型提供了一种易于处理的替代方法。此外，我们引入了主动学习来自适应地建立和改进传统高斯过程模型的预测能力。总的来说，我们发现，我们的方法提供了风力涡轮机尾流场的精确近似值，可以以比高保真物理模拟所产生的成本低几个数量级的成本进行查询。摘要：Wind farm design primarily depends on the variability of the wind turbine wake flows to the atmospheric wind conditions, and the interaction between wakes. Physics-based models that capture the wake flow-field with high-fidelity are computationally very expensive to perform layout optimization of wind farms, and, thus, data-driven reduced order models can represent an efficient alternative for simulating wind farms. In this work, we use real-world light detection and ranging (LiDAR) measurements of wind-turbine wakes to construct predictive surrogate models using machine learning. Specifically, we first demonstrate the use of deep autoencoders to find a low-dimensional \emph{latent} space that gives a computationally tractable approximation of the wake LiDAR measurements. Then, we learn the mapping between the parameter space and the (latent space) wake flow-fields using a deep neural network. Additionally, we also demonstrate the use of a probabilistic machine learning technique, namely, Gaussian process modeling, to learn the parameter-space-latent-space mapping in addition to the epistemic and aleatoric uncertainty in the data. Finally, to cope with training large datasets, we demonstrate the use of variational Gaussian process models that provide a tractable alternative to the conventional Gaussian process models for large datasets. Furthermore, we introduce the use of active learning to adaptively build and improve a conventional Gaussian process model predictive capability. Overall, we find that our approach provides accurate approximations of the wind-turbine wake flow field that can be queried at an orders-of-magnitude cheaper cost than those generated with high-fidelity physics-based simulations.

【4】 Data Efficient Masked Language Modeling for Vision and Language 标题：面向视觉和语言的数据高效掩蔽语言建模链接：https://arxiv.org/abs/2109.02040

作者：Yonatan Bitton,Gabriel Stanovsky,Michael Elhadad,Roy Schwartz 机构：♦School of Computer Science and Engineering, The Hebrew University of Jerusalem, Israel, ♠ Department of Computer Science, Ben Gurion University, Israel 备注：Accepted to Findings of EMNLP 2021 摘要：蒙面语言建模（MLM）是视觉语言预训练的关键子任务之一。在跨模态设置中，句子中的标记被随机屏蔽，并且模型预测给定图像和文本的屏蔽标记。在本文中，我们观察到传销在这种情况下的几个关键缺点。首先，由于标题往往较短，在三分之一的句子中，没有抽取任何标记。其次，大部分蒙面标记是停止词和标点符号，导致图像利用率不足。我们研究了一系列针对跨模态设置的替代掩蔽策略，以解决这些缺点，目的是在学习的表示中更好地融合文本和图像。在对LXMERT模型进行预训练时，我们的替代掩蔽策略在三个下游任务上始终优于原始掩蔽策略，尤其是在低资源设置下。此外，在基于提示的探测任务中，我们的预训练方法大大优于基线模型，该任务旨在引出图像对象。这些结果和我们的分析表明，我们的方法可以更好地利用训练数据。摘要：Masked language modeling (MLM) is one of the key sub-tasks in vision-language pretraining. In the cross-modal setting, tokens in the sentence are masked at random, and the model predicts the masked tokens given the image and the text. In this paper, we observe several key disadvantages of MLM in this setting. First, as captions tend to be short, in a third of the sentences no token is sampled. Second, the majority of masked tokens are stop-words and punctuation, leading to under-utilization of the image. We investigate a range of alternative masking strategies specific to the cross-modal setting that address these shortcomings, aiming for better fusion of text and image in the learned representation. When pre-training the LXMERT model, our alternative masking strategies consistently improve over the original masking strategy on three downstream tasks, especially in low resource settings. Further, our pre-training approach substantially outperforms the baseline model on a prompt-based probing task designed to elicit image objects. These results and our analysis indicate that our method allows for better utilization of the training data.

【5】 Variational Physics Informed Neural Networks: the role of quadratures and test functions 标题：变分物理知情神经网络：求积和测试函数的作用链接：https://arxiv.org/abs/2109.02035

作者：Stefano Berrone,Claudio Canuto,Moreno Pintore 备注：19 pages, 31 figures 摘要：在这项工作中，我们分析了在求解椭圆边值问题时，不同精度的高斯或牛顿-科茨求积规则以及不同程度的分段多项式测试函数如何影响变分物理信息神经网络（VPINN）在网格细化方面的收敛速度。利用Petrov-Galerkin框架，基于inf-sup条件，我们推导了计算神经网络的精确解和合适的高阶分段插值之间能量范数的先验误差估计。数值实验证实了理论预测的正确性，同时也表明当不插值神经网络时，误差衰减也遵循同样的规律。我们的结果表明，对于光滑解，实现高误差衰减率的最佳策略在于选择最低多项式次数的测试函数，同时使用适当高精度的求积公式。摘要：In this work we analyze how Gaussian or Newton-Cotes quadrature rules of different precisions and piecewise polynomial test functions of different degrees affect the convergence rate of Variational Physics Informed Neural Networks (VPINN) with respect to mesh refinement, while solving elliptic boundary-value problems. Using a Petrov-Galerkin framework relying on an inf-sup condition, we derive an a priori error estimate in the energy norm between the exact solution and a suitable high-order piecewise interpolant of a computed neural network. Numerical experiments confirm the theoretical predictions, and also indicate that the error decay follows the same behavior when the neural network is not interpolated. Our results suggest, somehow counterintuitively, that for smooth solutions the best strategy to achieve a high decay rate of the error consists in choosing test functions of the lowest polynomial degree, while using quadrature formulas of suitably high precision.

【6】 A Neural Network-Based Linguistic Similarity Measure for Entrainment in Conversations 标题：一种基于神经网络的会话诱导语言相似度度量方法链接：https://arxiv.org/abs/2109.01924

作者：Mingzhi Yu,Diane Litman,Shuang Ma,Jian Wu 机构：University of Pittsburgh, Microsoft Corporation, Redmond 摘要：语言夹带是一种人们在谈话中相互模仿的现象。量化夹带的核心工具是会话伙伴之间的语言相似性度量。目前大多数相似性度量方法都是基于词袋方法，这种方法依赖于语言标记，忽略了整体语言结构和对话语境。为了解决这个问题，我们建议使用神经网络模型对夹带进行相似性度量。我们的模型是上下文感知的，它进一步利用一个新的组件来学习对话中共享的高级语言特征。我们首先研究我们的新组件的有效性。然后，我们使用该模型在基于语料库的夹带分析中进行相似性度量。我们在这两项评估任务中都观察到了有希望的结果。摘要：Linguistic entrainment is a phenomenon where people tend to mimic each other in conversation. The core instrument to quantify entrainment is a linguistic similarity measure between conversational partners. Most of the current similarity measures are based on bag-of-words approaches that rely on linguistic markers, ignoring the overall language structure and dialogue context. To address this issue, we propose to use a neural network model to perform the similarity measure for entrainment. Our model is context-aware, and it further leverages a novel component to learn the shared high-level linguistic features across dialogues. We first investigate the effectiveness of our novel component. Then we use the model to perform similarity measure in a corpus-based entrainment analysis. We observe promising results for both evaluation tasks.

【7】 Estimating the probabilities of causation via deep monotonic twin networks 标题：基于深度单调孪生网络的因果概率估计链接：https://arxiv.org/abs/2109.01904

作者：Athanasios Vlontzos,Bernhard Kainz,Ciaran M. Gilligan-Lee 机构： BioMedIA, Imperial College London, FAU Erlangen-Nuremberg, Spotify & University College London 备注：7 pages + appendix 摘要：最近有很多工作使用机器学习来回答因果查询。大多数集中于介入查询，如条件平均治疗效果。然而，正如Pearl所指出的，介入性查询只是因果查询更大层次结构的一部分，而反事实位于顶部。尽管如此，我们的社区并没有完全成功地适应机器学习工具来回答反事实的查询。这项工作通过展示如何实现双网络反事实推理（诱拐、行动和预测反事实推理的替代方案）以及深入学习估计反事实查询来应对这一挑战。我们展示了孪生网络的图形特性如何使它们特别适合深度学习，产生了简单的神经网络结构，经过训练后，能够进行反事实推理。重要的是，我们展示了如何在训练期间强制实施已知的可识别性约束，确保每个反事实查询的答案是唯一确定的。我们通过使用它来准确估计因果关系的概率来展示我们的方法——重要的反事实查询，量化了一个事件在多大程度上是另一个事件的必要或充分原因——对合成数据和真实数据都是如此。摘要：There has been much recent work using machine learning to answer causal queries. Most focus on interventional queries, such as the conditional average treatment effect. However, as noted by Pearl, interventional queries only form part of a larger hierarchy of causal queries, with counterfactuals sitting at the top. Despite this, our community has not fully succeeded in adapting machine learning tools to answer counterfactual queries. This work addresses this challenge by showing how to implement twin network counterfactual inference -- an alternative to abduction, action, & prediction counterfactual inference -- with deep learning to estimate counterfactual queries. We show how the graphical nature of twin networks makes them particularly amenable to deep learning, yielding simple neural network architectures that, when trained, are capable of counterfactual inference. Importantly, we show how to enforce known identifiability constraints during training, ensuring the answer to each counterfactual query is uniquely determined. We demonstrate our approach by using it to accurately estimate the probabilities of causation -- important counterfactual queries that quantify the degree to which one event was a necessary or sufficient cause of another -- on both synthetic and real data.

【8】 Frustratingly Simple Pretraining Alternatives to Masked Language Modeling 标题：掩蔽语言建模的替代方案简单得令人沮丧链接：https://arxiv.org/abs/2109.01819

作者：Atsuki Yamaguchi,George Chrysostomou,Katerina Margatina,Nikolaos Aletras 机构：Research and Development Group, Hitachi, Ltd., Japan, Department of Computer Science, University of Sheffield, United Kingdom 备注：Accepted at EMNLP 2021 摘要：蒙面语言建模（MLM）是一种自监督的预训练目标，广泛应用于自然语言处理中，用于学习文本表示。MLM训练一个模型来预测在整个词汇表的多类设置中被[MASK]占位符替换的输入标记的随机样本。在预训练时，通常与传销一起使用令牌或序列级别的其他辅助目标，以提高下游性能（例如，下一句预测）。然而，到目前为止，还没有任何工作试图检验其他更简单的语言直观目标是否可以单独用作主要的训练前目标。在本文中，我们探索了五个简单的基于令牌级别分类任务的预训练目标，作为传销的替代。在GLUE和SQuAD上的实验结果表明，我们提出的方法与使用BERT-BASE架构的MLM具有相当或更好的性能。我们使用更小的模型进一步验证了我们的方法，结果表明，对具有41%的BERT-BASE参数的模型进行预训练，BERT-MEDIUM只会使我们的最佳目标的GLUE分数下降1%。摘要：Masked language modeling (MLM), a self-supervised pretraining objective, is widely used in natural language processing for learning text representations. MLM trains a model to predict a random sample of input tokens that have been replaced by a [MASK] placeholder in a multi-class setting over the entire vocabulary. When pretraining, it is common to use alongside MLM other auxiliary objectives on the token or sequence level to improve downstream performance (e.g. next sentence prediction). However, no previous work so far has attempted in examining whether other simpler linguistically intuitive or not objectives can be used standalone as main pretraining objectives. In this paper, we explore five simple pretraining objectives based on token-level classification tasks as replacements of MLM. Empirical results on GLUE and SQuAD show that our proposed methods achieve comparable or better performance to MLM using a BERT-BASE architecture. We further validate our methods using smaller models, showing that pretraining a model with 41% of the BERT-BASE's parameters, BERT-MEDIUM results in only a 1% drop in GLUE scores with our best objective.

【9】 Acceleration Method for Learning Fine-Layered Optical Neural Networks 标题：一种学习精细分层光学神经网络的加速方法链接：https://arxiv.org/abs/2109.01731

作者：Kazuo Aoyama,Hiroshi Sawada 备注：9 pages, 9 figures 摘要：光学神经网络（ONN）由于其高速、低功耗的特点，是一种很有前途的系统。其线性单元在光模拟电路中执行输入向量和权重矩阵的乘法。其中，具有可编程马赫-曾德尔干涉仪（MZIs）多层结构的电路可以实现一类特定的酉矩阵，其权矩阵为有限个mzi。该电路可有效平衡可编程MZI的数量和ONN性能。然而，使用机器学习平台配备的传统自动微分（AD）学习电路的MZI参数需要花费大量时间。为了解决这个耗时的问题，我们提出了一种学习MZI参数的加速方法。我们利用Wirtinger导数和链式规则为MZI创建定制的复值导数。它们被集成到我们新开发的C++中的功能模块中，以在多层结构中共同计算它们的值。我们的方法简单、快速、通用，并且与传统AD兼容。我们证明，当在基于MZI的隐单元的复值递归神经网络中执行逐像素MNIST任务时，我们的方法比传统AD快20倍。摘要：An optical neural network (ONN) is a promising system due to its high-speed and low-power operation. Its linear unit performs a multiplication of an input vector and a weight matrix in optical analog circuits. Among them, a circuit with a multiple-layered structure of programmable Mach-Zehnder interferometers (MZIs) can realize a specific class of unitary matrices with a limited number of MZIs as its weight matrix. The circuit is effective for balancing the number of programmable MZIs and ONN performance. However, it takes a lot of time to learn MZI parameters of the circuit with a conventional automatic differentiation (AD), which machine learning platforms are equipped with. To solve the time-consuming problem, we propose an acceleration method for learning MZI parameters. We create customized complex-valued derivatives for an MZI, exploiting Wirtinger derivatives and a chain rule. They are incorporated into our newly developed function module implemented in C++ to collectively calculate their values in a multi-layered structure. Our method is simple, fast, and versatile as well as compatible with the conventional AD. We demonstrate that our method works 20 times faster than the conventional AD when a pixel-by-pixel MNIST task is performed in a complex-valued recurrent neural network with an MZI-based hidden unit.

【10】 Communication Efficient Tensor Factorization for Decentralized Healthcare Networks 标题：分散医疗网络的通信有效张量分解链接：https://arxiv.org/abs/2109.01718

作者：Jing Ma,Qiuchen Zhang,Jian Lou,Li Xiong,Sivasubramanium Bhavani,Joyce C. Ho 机构：Emory University, Atlanta, Georgia, Xidian University, Guangzhou 备注：Short version accepted to IEEE ICDM 2021 摘要：张量因子分解已被证明是一种有效的无监督学习方法，用于健康数据分析，特别是计算表型分析，其中高维电子健康记录（EHR）包含患者的医疗程序、药物、诊断、实验室测试等病史。，转化为有意义和可解释的医学概念。联邦张量因子分解在中央服务器的协调下将张量计算分发给多个工作人员，这使得能够跨多个医院联合学习表型，同时保护患者信息的隐私。然而，现有的联合张量分解算法在涉及中央服务器时会遇到单点故障问题，这不仅容易受到外部攻击，而且在受限的上行带宽下限制了与服务器共享信息的客户端数量。在本文中，我们提出了CiderTF，一种通信高效的分散广义张量分解，它通过利用为广义张量分解设计的四级通信缩减策略来降低上行链路通信成本，它具有用多种损失函数模拟不同张量分布的灵活性。在两个真实EHR数据集上的实验表明，CiderTF实现了相当的收敛性，通信减少高达99.99%。摘要：Tensor factorization has been proved as an efficient unsupervised learning approach for health data analysis, especially for computational phenotyping, where the high-dimensional Electronic Health Records (EHRs) with patients history of medical procedures, medications, diagnosis, lab tests, etc., are converted to meaningful and interpretable medical concepts. Federated tensor factorization distributes the tensor computation to multiple workers under the coordination of a central server, which enables jointly learning the phenotypes across multiple hospitals while preserving the privacy of the patient information. However, existing federated tensor factorization algorithms encounter the single-point-failure issue with the involvement of the central server, which is not only easily exposed to external attacks, but also limits the number of clients sharing information with the server under restricted uplink bandwidth. In this paper, we propose CiderTF, a communication-efficient decentralized generalized tensor factorization, which reduces the uplink communication cost by leveraging a four-level communication reduction strategy designed for a generalized tensor factorization, which has the flexibility of modeling different tensor distribution with multiple kinds of loss functions. Experiments on two real-world EHR datasets demonstrate that CiderTF achieves comparable convergence with the communication reduction up to 99.99%.

【11】 Data science and Machine learning in the Clouds: A Perspective for the Future 标题：云中的数据科学和机器学习：展望未来链接：https://arxiv.org/abs/2109.01661

作者：Hrishav Bakul Barua 机构：Robotics and Autonomous Systems Research Group, Cognitive Robotics & Vision, TCS Research, Kolkata, India 备注：Preprint submitted for review 摘要：随着科学领域范式转换的开始，数据驱动科学（所谓的第四科学范式）将成为研究和创新的驱动力。从医学到生物多样性，从天文学到地质学，所有这些术语都会在某种程度上受到这种范式转变的影响。在这种新模式下需要处理的大量数据将是未来的一个主要问题，并且在这些计算的所有方面（从存储到计算和其他服务）都强烈需要基于云的服务。另一个方面将是能源消耗和预测工作和任务在这样一个科学范式中的表现，这将改变人们看待计算的方式。数据科学对机器学习、信号/图像/视频处理相关算法、人工智能、机器人学、健康信息学、地理信息学以及更多此类感兴趣的领域产生了重大影响，甚至触发了这些领域的出现。因此，我们设想了一个时代，在这个时代，数据科学可以借助现有的基于云的平台和服务，通过添加新的服务来兑现其承诺。在本文中，我们将讨论数据驱动科学和机器学习，以及未来如何通过基于云的服务将它们联系起来。它还讨论了近似计算、量子计算等范式在最近的兴起，以及它们在云环境中的大数据处理、数据科学、分析、预测和机器学习中的适用性。摘要：As we are fast approaching the beginning of a paradigm shift in the field of science, Data driven science (the so called fourth science paradigm) is going to be the driving force in research and innovation. From medicine to biodiversity and astronomy to geology, all these terms are somehow going to be affected by this paradigm shift. The huge amount of data to be processed under this new paradigm will be a major concern in the future and one will strongly require cloud based services in all the aspects of these computations (from storage to compute and other services). Another aspect will be energy consumption and performance of prediction jobs and tasks within such a scientific paradigm which will change the way one sees computation. Data science has heavily impacted or rather triggered the emergence of Machine Learning, Signal/Image/Video processing related algorithms, Artificial intelligence, Robotics, health informatics, geoinformatics, and many more such areas of interest. Hence, we envisage an era where Data science can deliver its promises with the help of the existing cloud based platforms and services with the addition of new services. In this article, we discuss about data driven science and Machine learning and how they are going to be linked through cloud based services in the future. It also discusses the rise of paradigms like approximate computing, quantum computing and many more in recent times and their applicability in big data processing, data science, analytics, prediction and machine learning in the cloud environments.

【12】 A Farewell to the Bias-Variance Tradeoff? An Overview of the Theory of Overparameterized Machine Learning 标题：告别偏差-方差权衡？超参数化机器学习理论综述链接：https://arxiv.org/abs/2109.02355

作者：Yehuda Dar,Vidya Muthukumar,Richard G. Baraniuk 机构： Georgia Institute of Tech-nology 摘要：机器学习（ML）最近的快速发展提出了许多科学问题，挑战了该领域长期以来的教条。最重要的谜团之一是过度参数化模型的良好经验推广。过参数化模型相对于训练数据集的大小过于复杂，这导致它们完美地拟合（即插值）训练数据，而训练数据通常是有噪声的。这种噪声数据的插值传统上与有害的过度拟合有关，然而，从简单的线性模型到深度神经网络的各种插值模型最近被观察到在新的测试数据上具有非常好的泛化能力。事实上，最近发现的双下降现象表明，在测试性能方面，高度过参数化的模型往往比最佳欠参数化的模型更好。在这种过度参数化的制度下理解学习需要新的理论和基础实证研究，即使是最简单的线性模型。这种理解的基础已经在最近对过参数化线性回归和相关统计学习任务的分析中奠定，这导致了双下降的精确分析特征。本文简要概述了这一新兴的超参数化ML理论（以下简称TOPML），该理论从统计信号处理的角度解释了这些最新发现。我们强调将TOPML研究领域定义为现代ML理论的一个子领域的独特方面，并概述了仍然存在的有趣的开放性问题。摘要：The rapid recent progress in machine learning (ML) has raised a number of scientific questions that challenge the longstanding dogma of the field. One of the most important riddles is the good empirical generalization of overparameterized models. Overparameterized models are excessively complex with respect to the size of the training dataset, which results in them perfectly fitting (i.e., interpolating) the training data, which is usually noisy. Such interpolation of noisy data is traditionally associated with detrimental overfitting, and yet a wide range of interpolating models -- from simple linear models to deep neural networks -- have recently been observed to generalize extremely well on fresh test data. Indeed, the recently discovered double descent phenomenon has revealed that highly overparameterized models often improve over the best underparameterized model in test performance. Understanding learning in this overparameterized regime requires new theory and foundational empirical studies, even for the simplest case of the linear model. The underpinnings of this understanding have been laid in very recent analyses of overparameterized linear regression and related statistical learning tasks, which resulted in precise analytic characterizations of double descent. This paper provides a succinct overview of this emerging theory of overparameterized ML (henceforth abbreviated as TOPML) that explains these recent findings through a statistical signal processing perspective. We emphasize the unique aspects that define the TOPML research area as a subfield of modern ML theory and outline interesting open questions that remain.

【13】 FBCNN: A Deep Neural Network Architecture for Portable and Fast Brain-Computer Interfaces 标题：FBCNN：一种便携式快速脑机接口的深度神经网络结构链接：https://arxiv.org/abs/2109.02165

作者：Pedro R. A. S. Bassi,Romis Attux 机构：Department of Computer Engineering and Industrial Automation, School of Electricaland Computer Engineering, University of Campinas - UNICAMP 摘要：目的：提出一种新的深度神经网络（DNN）结构——滤波器组卷积神经网络（FBCNN），以改进单通道BCI中小数据长度的SSVEP分类。方法：我们提出两个模型：FBCNN-2D和FBCNN-3D。FBCNN-2D利用滤波器组创建脑电图（EEG）信号的子带分量，并使用快速傅立叶变换（FFT）进行变换，然后使用2D CNN进行分析。FBCNN-3D使用相同的滤波器组，但它通过短时傅立叶变换（STFT）将子带成分转换为光谱图，并使用3D CNN对其进行分析。我们利用迁移学习。为了训练FBCNN-3D，我们提出了一种称为跨维转移学习的新技术，将知识从二维DNN转移到三维DNN。我们的BCI设计不需要最终用户进行校准：因此，测试对象数据与训练和验证分离。结果：FBCCA-2D和FBCCA-3D的平均测试准确率分别为85.7%和85%。F1平均得分分别为0.858和0.853。替代分类方法SVM、FBCCA和CNN的平均准确率分别为79.2%、80.1%和81.4%。结论：FBCNS在我们的模拟BCI中超过了传统的SSVEP分类方法，有相当大的差距（准确率高出约5%）。迁移学习和跨维度迁移学习使训练更快、更可预测。意义：我们提出了一种新的、灵活的DNN类型，它在便携式和快速BCI的SSVEP分类中比标准方法具有更好的性能。摘要：Objective: To propose a novel deep neural network (DNN) architecture -- the filter bank convolutional neural network (FBCNN) -- to improve SSVEP classification in single-channel BCIs with small data lengths. Methods: We propose two models: the FBCNN-2D and the FBCNN-3D. The FBCNN-2D utilizes a filter bank to create sub-band components of the electroencephalography (EEG) signal, which it transforms using the fast Fourier transform (FFT) and analyzes with a 2D CNN. The FBCNN-3D utilizes the same filter bank, but it transforms the sub-band components into spectrograms via short-time Fourier transform (STFT), and analyzes them with a 3D CNN. We made use of transfer learning. To train the FBCNN-3D, we proposed a new technique, called inter-dimensional transfer learning, to transfer knowledge from a 2D DNN to a 3D DNN. Our BCI was conceived so as not to require calibration from the final user: therefore, the test subject data was separated from training and validation. Results: The mean test accuracy was 85.7% for the FBCCA-2D and 85% for the FBCCA-3D. Mean F1-Scores were 0.858 and 0.853. Alternative classification methods, SVM, FBCCA and a CNN, had mean accuracy of 79.2%, 80.1% and 81.4%, respectively. Conclusion: The FBCNNs surpassed traditional SSVEP classification methods in our simulated BCI, by a considerable margin (about 5% higher accuracy). Transfer learning and inter-dimensional transfer learning made training much faster and more predictable. Significance: We proposed a new and flexible type of DNN, which had a better performance than standard methods in SSVEP classification for portable and fast BCIs.

【14】 (M)SLAe-Net: Multi-Scale Multi-Level Attention embedded Network for Retinal Vessel Segmentation 标题：(M)Slae-Net：用于视网膜血管分割的多尺度、多层次注意力嵌入网络链接：https://arxiv.org/abs/2109.02084

作者：Shreshth Saini,Geetika Agrawal 机构：Department of Electrical Engineering, Indian Institute of Technology Jodhpur 备注：5 pages, 4 figures, Accepted and Presented in 9TH IEEE INTERNATIONAL CONFERENCE ON HEALTHCARE INFORMATICS (IEEE-ICHI 2021), Victoria, British Columbia, Canada 摘要：分割在诊断中起着至关重要的作用。从眼底图像研究视网膜血管有助于识别许多重要疾病的早期迹象，如糖尿病视网膜病变。由于视网膜血管的形状、大小和模式以及眼底图像中的伪影和噪声的不同，没有一种单阶段方法能够准确地分割视网膜血管。在这项工作中，我们提出了一种多尺度、多层次的注意力嵌入式CNN体系结构（（M）SLAe网络），以解决多阶段处理问题，从而实现视网膜血管的稳健和精确分割。我们通过在网络的多个尺度和多个层次上提取特征来实现这一点，使我们的模型能够全面地提取局部和全局特征。使用我们新的动态扩展金字塔池（D-DPP）模块提取多尺度特征。我们还聚合了所有网络级别的功能。这些有效地解决了不同形状和人工制品的问题，因此需要多个阶段。为了帮助更好地进行像素级分类，我们使用了挤压和注意（SA）模块，这是挤压和激励（SE）模块的一个智能化版本，用于我们网络中的分割任务，以促进像素组注意。我们独特的网络设计和新颖的D-DPP模块，以及针对薄壁血管的高效特定任务损耗函数，使我们的模型具有更好的交叉数据性能。在DRIVE、STARE、HRF和CHASE-DB1上的详尽实验结果表明了该方法的优越性。摘要：Segmentation plays a crucial role in diagnosis. Studying the retinal vasculatures from fundus images help identify early signs of many crucial illnesses such as diabetic retinopathy. Due to the varying shape, size, and patterns of retinal vessels, along with artefacts and noises in fundus images, no one-stage method can accurately segment retinal vessels. In this work, we propose a multi-scale, multi-level attention embedded CNN architecture ((M)SLAe-Net) to address the issue of multi-stage processing for robust and precise segmentation of retinal vessels. We do this by extracting features at multiple scales and multiple levels of the network, enabling our model to holistically extracts the local and global features. Multi-scale features are extracted using our novel dynamic dilated pyramid pooling (D-DPP) module. We also aggregate the features from all the network levels. These effectively resolved the issues of varying shapes and artefacts and hence the need for multiple stages. To assist in better pixel-level classification, we use the Squeeze and Attention(SA) module, a smartly adapted version of the Squeeze and Excitation(SE) module for segmentation tasks in our network to facilitate pixel-group attention. Our unique network design and novel D-DPP module with efficient task-specific loss function for thin vessels enabled our model for better cross data performance. Exhaustive experimental results on DRIVE, STARE, HRF, and CHASE-DB1 show the superiority of our method.

【15】 Deep learning facilitates fully automated brain image registration of optoacoustic tomography and magnetic resonance imaging 标题：深度学习促进光声断层扫描和磁共振成像的全自动脑图像配准链接：https://arxiv.org/abs/2109.01880

作者：Yexing Hu,Berkan Lafci,Artur Luzgin,Hao Wang,Jan Klohs,Xose Luis Dean-Ben,Ruiqing Ni,Daniel Razansky,Wuwei Ren 机构：School of Information Science and Technology, ShanghaiTech University, Shanghai, China, Institute for Biomedical Engineering and Institute of Pharmacology and Toxicology, University of Z¨urich, Z¨urich, Switzerland 备注：15 pages, 5 figures 摘要：多光谱光声层析成像（MSOT）是一种新兴的光学成像方法，可从啮齿类动物的大脑中提供多种分子和功能信息。磁共振成像（MRI）可以提供极好的软组织对比度和高分辨率的大脑解剖结构，从而大大增强了这一功能。然而，多模态图像的配准仍然具有挑战性，主要是由于这些模态所呈现的图像对比度完全不同。先前报道的配准算法大多依赖于人工用户依赖的大脑分割，这会影响数据解释和精确量化。在这里，我们提出了一种通过深度学习实现MSOT-MRI多模式成像的全自动配准方法。自动工作流程包括基于神经网络的图像分割，以生成合适的遮罩，然后使用附加的神经网络进行注册。该算法的性能通过横断面MSOT和高场MRI临床前扫描仪获得的数据集进行了展示。通过手动和半自动配准，进一步验证了自动配准方法的有效性和准确性。摘要：Multi-spectral optoacoustic tomography (MSOT) is an emerging optical imaging method providing multiplex molecular and functional information from the rodent brain. It can be greatly augmented by magnetic resonance imaging (MRI) that offers excellent soft-tissue contrast and high-resolution brain anatomy. Nevertheless, registration of multi-modal images remains challenging, chiefly due to the entirely different image contrast rendered by these modalities. Previously reported registration algorithms mostly relied on manual user-dependent brain segmentation, which compromised data interpretation and accurate quantification. Here we propose a fully automated registration method for MSOT-MRI multimodal imaging empowered by deep learning. The automated workflow includes neural network-based image segmentation to generate suitable masks, which are subsequently registered using an additional neural network. Performance of the algorithm is showcased with datasets acquired by cross-sectional MSOT and high-field MRI preclinical scanners. The automated registration method is further validated with manual and half-automated registration, demonstrating its robustness and accuracy.

【16】 Model retraining and information sharing in a supply chain with long-term fluctuating demands 标题：需求长期波动的供应链模型再训练与信息共享链接：https://arxiv.org/abs/2109.01784

作者：Takahiro Ezaki,Naoto Imura,Katsuhiro Nishinari 机构：Research Center for Advanced Science and Technology, The University of Tokyo,-,-, Komaba, Meguro-ku, Tokyo ,-, Japan 摘要：基于经验数据的需求预测是优化供应链的可行方法。然而，在这种方法中，由于环境的长期变化，根据过去数据构建的模型偶尔会过时，在这种情况下，应使用最新数据更新（即重新训练）模型。在这项研究中，我们检验了在供应链中使用最小设置更新模型的效果。我们证明了当供应链中的每一方都有自己的预测模型时，即使采用非常简单的补货策略，不协调的模型再训练也会导致牛鞭效应。我们的结果还表明，参与各方共享预测模型可以显著降低牛鞭效应。摘要：Demand forecasting based on empirical data is a viable approach for optimizing a supply chain. However, in this approach, a model constructed from past data occasionally becomes outdated due to long-term changes in the environment, in which case the model should be updated (i.e., retrained) using the latest data. In this study, we examine the effects of updating models in a supply chain using a minimal setting. We demonstrate that when each party in the supply chain has its own forecasting model, uncoordinated model retraining causes the bullwhip effect even if a very simple replenishment policy is applied. Our results also indicate that sharing the forecasting model among the parties involved significantly reduces the bullwhip effect.

其他(27篇)

【1】 A Method for Inferring Polymers Based on Linear Regression and Integer Programming 标题：一种基于线性回归和整数规划的聚合物推断方法链接：https://arxiv.org/abs/2109.02628

作者：Ryota Ido,Shengjuan Cao,Jianshen Zhu,Naveed Ahmed Azam,Kazuya Haraguchi,Liang Zhao,Hiroshi Nagamochi,Tatsuya Akutsu 机构：Department of Applied Mathematics and Physics, Kyoto University, Kyoto ,-, Japan, Graduate School of Advanced Integrated Studies in Human Survavibility (Shishu-Kan), Kyoto University 备注：arXiv admin note: substantial text overlap with arXiv:2107.02381; text overlap with arXiv:2108.10266 摘要：最近，人们提出了一种新的框架，利用人工神经网络和混合整数线性规划设计具有所需化学性质的化合物的分子结构。在本文中，我们设计了一种基于该框架推断聚合物的新方法。为此，我们介绍了一种将聚合物表示为单体形式的新方法，并定义了表征聚合物结构的新描述符。我们还使用线性回归作为构建框架中预测函数的构建块。我们的计算实验结果揭示了一组聚合物的化学性质，其中用线性回归构造的预测函数表现良好。我们还观察到，所提出的方法可以推断出单体形式中含有多达50个非氢原子的聚合物。摘要：A novel framework has recently been proposed for designing the molecular structure of chemical compounds with a desired chemical property using both artificial neural networks and mixed integer linear programming. In this paper, we design a new method for inferring a polymer based on the framework. For this, we introduce a new way of representing a polymer as a form of monomer and define new descriptors that feature the structure of polymers. We also use linear regression as a building block of constructing a prediction function in the framework. The results of our computational experiments reveal a set of chemical properties on polymers to which a prediction function constructed with linear regression performs well. We also observe that the proposed method can infer polymers with up to 50 non-hydrogen atoms in a monomer form.

【2】 Gaussian Process Uniform Error Bounds with Unknown Hyperparameters for Safety-Critical Applications 标题：安全关键应用中具有未知超参数的高斯过程一致误差界链接：https://arxiv.org/abs/2109.02606

作者：Alexandre Capone,Armin Lederer,Sandra Hirche 机构：Chair of Information-Oriented Control, Department of Electrical and Computing Engineering, Technical University of Munich 摘要：由于后验方差可用于直接估计模型误差和量化风险，高斯过程已成为各种安全关键设置的一种有前途的工具。然而，用于安全关键设置的最新技术取决于内核超参数已知的假设，这在一般情况下并不适用。为了缓解这种情况，我们在具有未知超参数的设置中引入了鲁棒高斯过程一致误差界。我们的方法计算了超参数空间中的置信域，这使我们能够获得具有任意超参数的高斯过程的模型误差的概率上界。我们不需要事先知道超参数的任何界，这是相关工作中常见的假设。相反，我们能够以直观的方式从数据中导出边界。此外，我们还利用所提出的技术来推导一类基于学习的控制问题的性能保证。实验表明，该方法的性能明显优于一般的和完全贝叶斯高斯过程。摘要：Gaussian processes have become a promising tool for various safety-critical settings, since the posterior variance can be used to directly estimate the model error and quantify risk. However, state-of-the-art techniques for safety-critical settings hinge on the assumption that the kernel hyperparameters are known, which does not apply in general. To mitigate this, we introduce robust Gaussian process uniform error bounds in settings with unknown hyperparameters. Our approach computes a confidence region in the space of hyperparameters, which enables us to obtain a probabilistic upper bound for the model error of a Gaussian process with arbitrary hyperparameters. We do not require to know any bounds for the hyperparameters a priori, which is an assumption commonly found in related work. Instead, we are able to derive bounds from data in an intuitive fashion. We additionally employ the proposed technique to derive performance guarantees for a class of learning-based control problems. Experiments show that the bound performs significantly better than vanilla and fully Bayesian Gaussian processes.

【3】 Going Beyond Neural Architecture Search with Sampling-based Neural Ensemble Search 标题：基于采样的神经集成搜索超越神经结构搜索链接：https://arxiv.org/abs/2109.02533

作者：Yao Shu,Yizhou Chen,Zhongxiang Dai,Bryan Kian Hsiang Low 机构： Dept. of Computer Science, National University of Singapore, Republic of Singapore 摘要：近年来，神经结构搜索（NAS）被广泛应用于深度神经网络的自动化设计。人们提出了各种NAS算法来降低搜索成本，提高最终选定体系结构的泛化性能。然而，这些NAS算法旨在从搜索空间中只选择一个单一的神经结构，因此忽略了其他候选结构在帮助提高最终选择的结构性能方面的能力。为此，我们提出了两种新的采样算法，在我们的神经集成搜索通过采样（NESS）框架下，可以有效地从NAS搜索空间中选择性能良好的神经结构集成。与最先进的NAS算法和其他著名的集成搜索基线相比，我们的NESS算法能够在各种基准数据集上实现更好的分类和对抗性防御任务性能，同时产生与这些NAS算法相当的搜索成本。摘要：Recently, Neural Architecture Search (NAS) has been widely applied to automate the design of deep neural networks. Various NAS algorithms have been proposed to reduce the search cost and improve the generalization performance of those final selected architectures. However, these NAS algorithms aim to select only a single neural architecture from the search spaces and thus have overlooked the capability of other candidate architectures in helping improve the performance of their final selected architecture. To this end, we present two novel sampling algorithms under our Neural Ensemble Search via Sampling (NESS) framework that can effectively and efficiently select a well-performing ensemble of neural architectures from NAS search space. Compared with state-of-the-art NAS algorithms and other well-known ensemble search baselines, our NESS algorithms are shown to be able to achieve improved performance in both classification and adversarial defense tasks on various benchmark datasets while incurring a comparable search cost to these NAS algorithms.

【4】 Error Controlled Actor-Critic 标题：差错控制的演员-批评家链接：https://arxiv.org/abs/2109.02517

作者：Xingen Gao,Fei Chao,Changle Zhou,Zhen Ge,Chih-Min Lin,Longzhi Yang,Xiang Chang,Changjing Shang 机构：Department of Artificial Intelligence, Xiamen University, Xiamen, China., University of Technology Sydney, Sydney, Australia, Department of Electrical Engineering, Yuan Ze University, Taiwan, China, Computer Science and Digital Technologies Department 摘要：关于值函数的误差，不可避免地会引起高估现象，并对算法的收敛性产生负面影响。为了减轻逼近误差的负面影响，我们提出了误差控制的行动者批评家，以确保将逼近误差限制在值函数中。我们分析了近似误差如何阻碍actor-critic方法的优化过程，然后推导了Q函数逼近器的近似误差的上限，发现在训练策略时，通过限制每两个连续策略之间的KL发散，可以降低误差。在一系列连续控制任务上的实验结果表明，所提出的actor-critic算法明显降低了逼近误差，显著优于其他无模型RL算法。摘要：On error of value function inevitably causes an overestimation phenomenon and has a negative impact on the convergence of the algorithms. To mitigate the negative effects of the approximation error, we propose Error Controlled Actor-critic which ensures confining the approximation error in value function. We present an analysis of how the approximation error can hinder the optimization process of actor-critic methods.Then, we derive an upper boundary of the approximation error of Q function approximator and find that the error can be lowered by restricting on the KL-divergence between every two consecutive policies when training the policy. The results of experiments on a range of continuous control tasks demonstrate that the proposed actor-critic algorithm apparently reduces the approximation error and significantly outperforms other model-free RL algorithms.

【5】 Uniform Manifold Approximation and Projection (UMAP) and its Variants: Tutorial and Survey 标题：统一流形逼近和投影(UMAP)及其变体：教程和综述链接：https://arxiv.org/abs/2109.02508

作者：Benyamin Ghojogh,Ali Ghodsi,Fakhri Karray,Mark Crowley 机构：Department of Electrical and Computer Engineering, Machine Learning Laboratory, University of Waterloo, Waterloo, ON, Canada, Department of Statistics and Actuarial Science & David R. Cheriton School of Computer Science 备注：To appear as a part of an upcoming textbook on dimensionality reduction and manifold learning 摘要：统一流形近似与投影（UMAP）是一种用于降维和数据可视化的最新方法。这是一篇关于UMAP及其变体的教程和调查论文。我们从UMAP算法开始，解释了输入和嵌入空间中邻域的概率、代价函数的优化、训练算法、梯度的推导以及UMAP的监督和半监督嵌入。然后，我们通过代数拓扑和范畴理论介绍了UMAP背后的理论。然后，我们介绍了UMAP作为一种邻域嵌入方法，并将其与t-SNE和LargeVis算法进行了比较。我们讨论了UMAP代价函数中的负采样和排斥力。然后解释了密度保持嵌入的DensMAP。然后，我们通过深度学习引入参数化UMAP进行嵌入，并引入渐进式UMAP进行流式和样本外数据嵌入。摘要：Uniform Manifold Approximation and Projection (UMAP) is one of the state-of-the-art methods for dimensionality reduction and data visualization. This is a tutorial and survey paper on UMAP and its variants. We start with UMAP algorithm where we explain probabilities of neighborhood in the input and embedding spaces, optimization of cost function, training algorithm, derivation of gradients, and supervised and semi-supervised embedding by UMAP. Then, we introduce the theory behind UMAP by algebraic topology and category theory. Then, we introduce UMAP as a neighbor embedding method and compare it with t-SNE and LargeVis algorithms. We discuss negative sampling and repulsive forces in UMAP's cost function. DensMAP is then explained for density-preserving embedding. We then introduce parametric UMAP for embedding by deep learning and progressive UMAP for streaming and out-of-sample data embedding.

【6】 TraverseNet: Unifying Space and Time in Message Passing 标题：TraverseNet：在消息传递中统一空间和时间链接：https://arxiv.org/abs/2109.02474

作者：Zonghan Wu,Da Zheng,Shirui Pan,Quan Gan,Guodong Long,George Karypis 机构： Pan is with Department of Data Science and AI, Monash University 摘要：本文旨在统一非欧几里德空间中的空间依赖和时间依赖，同时捕获时空图形数据的内部时空依赖。对于具有拓扑结构的时空属性实体，时空是连续统一的，而每个节点的当前状态受其邻居在每个邻居的不同时期的过去状态的影响。大多数时空神经网络在处理过程中分别研究空间依赖和时间相关性，严重破坏了时空连续体，忽略了邻居对节点的时间依赖周期可以延迟和动态的事实。为了模拟这种实际情况，我们提出了一种新的时空图神经网络TraverseNet，它将空间和时间视为一个不可分割的整体，来挖掘时空图，同时通过消息遍历机制利用每个节点不断演化的时空依赖性。烧蚀实验和参数研究已经验证了所提出的穿越网络的有效性，详细的实现可以从https://github.com/nnzhan/TraverseNet. 摘要：This paper aims to unify spatial dependency and temporal dependency in a non-Euclidean space while capturing the inner spatial-temporal dependencies for spatial-temporal graph data. For spatial-temporal attribute entities with topological structure, the space-time is consecutive and unified while each node's current status is influenced by its neighbors' past states over variant periods of each neighbor. Most spatial-temporal neural networks study spatial dependency and temporal correlation separately in processing, gravely impaired the space-time continuum, and ignore the fact that the neighbors' temporal dependency period for a node can be delayed and dynamic. To model this actual condition, we propose TraverseNet, a novel spatial-temporal graph neural network, viewing space and time as an inseparable whole, to mine spatial-temporal graphs while exploiting the evolving spatial-temporal dependencies for each node via message traverse mechanisms. Experiments with ablation and parameter studies have validated the effectiveness of the proposed TraverseNets, and the detailed implementation can be found from https://github.com/nnzhan/TraverseNet.

【7】 Data Science Kitchen at GermEval 2021: A Fine Selection of Hand-Picked Features, Delivered Fresh from the Oven 标题：GermEval 2021年的数据科学厨房：手工挑选的精选功能，新鲜出炉链接：https://arxiv.org/abs/2109.02383

作者：Niclas Hildebrandt,Benedikt Boenninghoff,Dennis Orth,Christopher Schymura 机构：Data Science Kitchen 备注：Accepted at 17th Conference on Natural Language Processing (KONVENS 2021) 摘要：本文介绍了GermEval 2021年数据科学厨房在识别有毒、有吸引力和事实主张评论方面的贡献。这项任务旨在扩大攻击性语言的识别范围，包括识别评论的附加子任务，版主和社区管理者应优先对评论进行事实检查。我们的贡献集中于一种具有传统分类后端的特征工程方法。我们将从预先训练的深层神经网络中提取的语义和写作风格嵌入与额外的数字特征相结合，专门为此任务而设计。逻辑回归分类器和支持向量机的集合用于通过多数投票方案得出每个子任务的预测。我们的最佳提交获得了66.8%、69.9%和72.5%的F1宏观平均分数，用于识别有毒、引人入胜和声称事实的评论。摘要：This paper presents the contribution of the Data Science Kitchen at GermEval 2021 shared task on the identification of toxic, engaging, and fact-claiming comments. The task aims at extending the identification of offensive language, by including additional subtasks that identify comments which should be prioritized for fact-checking by moderators and community managers. Our contribution focuses on a feature-engineering approach with a conventional classification backend. We combine semantic and writing style embeddings derived from pre-trained deep neural networks with additional numerical features, specifically designed for this task. Ensembles of Logistic Regression classifiers and Support Vector Machines are used to derive predictions for each subtask via a majority voting scheme. Our best submission achieved macro-averaged F1-scores of 66.8%, 69.9% and 72.5% for the identification of toxic, engaging, and fact-claiming comments.

【8】 Improved RAMEN: Towards Domain Generalization for Visual Question Answering 标题：改进的拉面：面向视觉问答的领域泛化链接：https://arxiv.org/abs/2109.02370

作者：Bhanuka Manesha Samarasekara Vitharana Gamage,Lim Chern Hong 机构：School of Information Technology, Monash University, Bandar Sunway, Malaysia 备注：11 pages, 3 figures, 2 tables 摘要：目前，视觉问答（VQA）已接近人类水平，是人工智能领域的一个新兴领域。作为机器学习的一个多学科领域，计算机视觉和自然语言处理社区正在共同努力实现最先进的（SOTA）性能。然而，SOTA结果与实际应用之间存在差距。这是由于缺乏模型概括。RAMEN模型引用{Shrestha2019}的目的是通过在两种主要类型的VQA数据集中获得最高分数来实现领域泛化。本研究对RAMEN架构的早期/晚期融合模块和聚合模块进行了两项主要改进，目的是进一步加强领域泛化。融合模块采用基于向量运算的融合策略，聚合模块采用transformer架构。从实验中得到的多达五个VQA数据集的改进是显而易见的。根据研究结果，本研究分析了这两种改进对领域泛化问题的影响。该代码可通过以下链接在GitHub上获得\url{https://github.com/bhanukaManesha/ramen}. 摘要：Currently nearing human-level performance, Visual Question Answering (VQA) is an emerging area in artificial intelligence. Established as a multi-disciplinary field in machine learning, both computer vision and natural language processing communities are working together to achieve state-of-the-art (SOTA) performance. However, there is a gap between the SOTA results and real world applications. This is due to the lack of model generalisation. The RAMEN model \cite{Shrestha2019} aimed to achieve domain generalization by obtaining the highest score across two main types of VQA datasets. This study provides two major improvements to the early/late fusion module and aggregation module of the RAMEN architecture, with the objective of further strengthening domain generalization. Vector operations based fusion strategies are introduced for the fusion module and the transformer architecture is introduced for the aggregation module. Improvements of up to five VQA datasets from the experiments conducted are evident. Following the results, this study analyses the effects of both the improvements on the domain generalization problem. The code is available on GitHub though the following link \url{https://github.com/bhanukaManesha/ramen}.

【9】 Tensor Normalization and Full Distribution Training 标题：张量归一化与全分布训练链接：https://arxiv.org/abs/2109.02345

作者：Wolfgang Fuhl 机构：University Tübingen 摘要：在这项工作中，我们引入了像素级张量归一化，它插入到整流线性单元之后，与批量归一化一起，显著提高了现代深度神经网络的精度。此外，本文还研究了网络的鲁棒性。我们证明，将训练集中的图像分解叠加，并将多类问题重新表述为多标签问题，可以显著地提高网络的鲁棒性。与仅使用一个类别作为标签的叠加相比，多类别原木损失的重新计算和调整也改善了结果。https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FTNandFDT&mode=list 摘要：In this work, we introduce pixel wise tensor normalization, which is inserted after rectifier linear units and, together with batch normalization, provides a significant improvement in the accuracy of modern deep neural networks. In addition, this work deals with the robustness of networks. We show that the factorized superposition of images from the training set and the reformulation of the multi class problem into a multi-label problem yields significantly more robust networks. The reformulation and the adjustment of the multi class log loss also improves the results compared to the overlay with only one class as label. https://atreus.informatik.uni-tuebingen.de/seafile/d/8e2ab8c3fdd444e1a135/?p=%2FTNandFDT&mode=list

【10】 Information Theory-Guided Heuristic Progressive Multi-View Coding 标题：信息论指导的启发式渐进多视点编码链接：https://arxiv.org/abs/2109.02344

作者：Jiangmeng Li,Wenwen Qiang,Hang Gao,Bing Su,Farid Razzak,Jie Hu,Changwen Zheng,Hui Xiong 机构： Institute of Software Chinese Academy of Sciences , University of Chinese Academy of Sciences, Renmin University of China , New York University , Bytedance, Artificial Intelligence Thrust, The Hong Kong University of Science and Technology 摘要：多视图表示学习从共享上下文的多个视图中获取综合信息。最近的工作直观地将对比学习（CL）应用于学习表征，被认为是一种成对的方式，它仍然是可伸缩的：在学习视图共享表征时不过滤视图特定的噪声；假负对，其中负项实际上与正项在同一类中，而真负对被同等对待；而均匀地测量术语之间的相似性可能会干扰优化。重要的是，很少有研究广义自监督多视角学习的理论框架，尤其是针对两个以上视角的研究。为此，我们从信息理论的角度重新思考了现有的多视角学习范式，并提出了广义多视角学习的信息理论框架。在此基础上，提出了一种三层递进结构的多视图编码方法，即信息论指导的启发式递进多视图编码（IPMC）。在分发层中，IPMC将视图之间的分发对齐，以减少视图特定的噪波。在set层中，IPMC构建用于对比的自调整池，它利用视图过滤器自适应地修改池。最后，在实例层，我们采用设计的统一损失学习区分表示，减少梯度干扰。从理论和实证上，我们证明了IPMC优于最先进的方法。摘要：Multi-view representation learning captures comprehensive information from multiple views of a shared context. Recent works intuitively apply contrastive learning (CL) to learn representations, regarded as a pairwise manner, which is still scalable: view-specific noise is not filtered in learning view-shared representations; the fake negative pairs, where the negative terms are actually within the same class as the positive, and the real negative pairs are coequally treated; and evenly measuring the similarities between terms might interfere with optimization. Importantly, few works research the theoretical framework of generalized self-supervised multi-view learning, especially for more than two views. To this end, we rethink the existing multi-view learning paradigm from the information theoretical perspective and then propose a novel information theoretical framework for generalized multi-view learning. Guided by it, we build a multi-view coding method with a three-tier progressive architecture, namely Information theory-guided heuristic Progressive Multi-view Coding (IPMC). In the distribution-tier, IPMC aligns the distribution between views to reduce view-specific noise. In the set-tier, IPMC builds self-adjusted pools for contrasting, which utilizes a view filter to adaptively modify the pools. Lastly, in the instance-tier, we adopt a designed unified loss to learn discriminative representations and reduce the gradient interference. Theoretically and empirically, we demonstrate the superiority of IPMC over state-of-the-art methods.

【11】 Fast Hypergraph Regularized Nonnegative Tensor Ring Factorization Based on Low-Rank Approximation 标题：基于低秩逼近的快速超图正则化非负张量环分解链接：https://arxiv.org/abs/2109.02314

作者：Xinhai Zhao,Yuyuan Yu,Guoxu Zhou,Qibin Zhao,Weijun Sun 机构：School of Automation, Guangdong University of Technology, Guangzhou, China, The Center for Advanced Intelligence Project (AIP), RIKEN , Tokyo, Japan, Guangdong Key Laboratory of IoT Information Technology, Guangzhou, China 摘要：对于高维数据表示，采用流形学习的非负张量环（NTR）分解已成为一种很有希望的利用多维结构和从张量数据中提取特征的模型。然而，现有的方法，如图正则化张量环分解（GNTR）只对对象的成对相似性进行建模。对于具有复杂流形结构的张量数据，图不能准确地构造相似关系。本文将超图引入到NTR框架中，进一步提高了特征提取的效率，并在此基础上提出了超图正则化非负张量环分解（HGNTR）方法。为了降低计算复杂度和抑制噪声，我们采用低秩近似技巧来加速HGNTR（称为LraHGNTR）。我们的实验结果表明，与其他最先进的算法相比，所提出的HGNTR和LraHGNTR在集群任务中可以获得更高的性能，此外，LraHGNTR可以在不降低精度的情况下大大减少运行时间。摘要：For the high dimensional data representation, nonnegative tensor ring (NTR) decomposition equipped with manifold learning has become a promising model to exploit the multi-dimensional structure and extract the feature from tensor data. However, the existing methods such as graph regularized tensor ring decomposition (GNTR) only models the pair-wise similarities of objects. For tensor data with complex manifold structure, the graph can not exactly construct similarity relationships. In this paper, in order to effectively utilize the higher-dimensional and complicated similarities among objects, we introduce hypergraph to the framework of NTR to further enhance the feature extraction, upon which a hypergraph regularized nonnegative tensor ring decomposition (HGNTR) method is developed. To reduce the computational complexity and suppress the noise, we apply the low-rank approximation trick to accelerate HGNTR (called LraHGNTR). Our experimental results show that compared with other state-of-the-art algorithms, the proposed HGNTR and LraHGNTR can achieve higher performance in clustering tasks, in addition, LraHGNTR can greatly reduce running time without decreasing accuracy.

【12】 Urban Fire Station Location Planning: A Systematic Approach using Predicted Demand and Service Quality Index 标题：城市消防站选址规划：基于预测需求和服务质量指标的系统方法链接：https://arxiv.org/abs/2109.02160

作者：Arnab Dey,Andrew Heger,Darin England 机构：Electrical and Computer Engineering, University of Minnesota, Twin Cities, Minneapolis, MN, USA., Fire Chief, City of Victoria Fire Department, Victoria, MN, USA 摘要：在这篇文章中，我们提出了一个系统的方法，消防站的位置规划。我们开发了一个基于随机森林的机器学习模型，用于需求预测，并利用该模型进一步定义了一个广义指标来衡量城市环境中的消防服务质量。我们的模型基于从多个不同来源收集的空间数据。适当设施规划的有效性取决于消防站和现有消防站（如有）的候选位置的选择。此外，需要注意从这些候选人到需求地点的旅行时间，以保持消防安全标准。在这里，我们提出了一种基于旅行时间的聚类技术来识别合适的候选对象。最后，我们提出了一个优化问题来选择安装新消防站的最佳位置。我们的优化问题建立在基于整数规划的最大覆盖问题之上。我们与美国明尼苏达州维多利亚市消防局合作，对我们提出的方法进行了详细的实验研究。我们的需求预测模型实现了约70%的真阳性率和22%的假阳性率。我们使用我们的方法帮助维多利亚消防局选择新消防站的位置。根据我们的方法建议，我们在维多利亚市设立了一个新的设施，以展示改善统计的详细结果。摘要：In this article, we propose a systematic approach for fire station location planning. We develop a machine learning model, based on Random Forest, for demand prediction and utilize the model further to define a generalized index to measure quality of fire service in urban settings. Our model is built upon spatial data collected from multiple different sources. Efficacy of proper facility planning depends on choice of candidates where fire stations can be located along with existing stations, if any. Also, the travel time from these candidates to demand locations need to be taken care of to maintain fire safety standard. Here, we propose a travel time based clustering technique to identify suitable candidates. Finally, we develop an optimization problem to select best locations to install new fire stations. Our optimization problem is built upon maximum coverage problem, based on integer programming. We present a detailed experimental study of our proposed approach in collaboration with city of Victoria Fire Department, MN, USA. Our demand prediction model achieves true positive rate of 70% and false positive rate of 22% approximately. We aid Victoria Fire Department to select a location for a new fire station using our approach. We present detailed results on improvement statistics by locating a new facility, as suggested by our methodology, in the city of Victoria.

【13】 NAS-OoD: Neural Architecture Search for Out-of-Distribution Generalization 标题：NAS-OOD：非分布泛化的神经结构搜索链接：https://arxiv.org/abs/2109.02038

作者：Haoyue Bai,Fengwei Zhou,Lanqing Hong,Nanyang Ye,S. -H. Gary Chan,Zhenguo Li 机构：The Hong Kong University, of Science and Technology, Huawei Noah’s Ark Lab, Shanghai Jiao Tong University, S.-H. Gary Chan 备注：Accepted by ICCV2021 摘要：分布外（OoD）泛化的最新进展揭示了深度学习模型对分布变化的鲁棒性。然而，现有的研究主要集中在面向对象的算法上，如不变风险最小化、领域泛化或稳定学习，而没有考虑深层模型结构对面向对象泛化的影响，这可能导致性能的次优。神经结构搜索（NAS）方法根据训练数据搜索结构，这可能导致OoD任务泛化能力差。在这项工作中，我们提出了稳健的面向对象泛化的神经结构搜索（NAS-OoD），它通过梯度下降优化了该结构在生成的面向对象数据上的性能。具体来说，数据生成器通过最大化不同神经结构计算的损失来合成OoD数据，而结构搜索的目标是找到使合成OoD数据损失最小化的最佳结构参数。数据生成器和神经结构以端到端的方式进行联合优化，并且极大极小训练过程有效地发现鲁棒结构，这些结构可以很好地概括不同的分布变化。大量的实验结果表明，NAS OoD在各种OoD泛化基准测试上取得了优异的性能，深度模型的参数数量要少得多。此外，在一个真实的行业数据集上，与最先进的方法相比，所提出的NAS OoD方法将错误率降低了70%以上，证明了所提出的方法在实际应用中的实用性。摘要：Recent advances on Out-of-Distribution (OoD) generalization reveal the robustness of deep learning models against distribution shifts. However, existing works focus on OoD algorithms, such as invariant risk minimization, domain generalization, or stable learning, without considering the influence of deep model architectures on OoD generalization, which may lead to sub-optimal performance. Neural Architecture Search (NAS) methods search for architecture based on its performance on the training data, which may result in poor generalization for OoD tasks. In this work, we propose robust Neural Architecture Search for OoD generalization (NAS-OoD), which optimizes the architecture with respect to its performance on generated OoD data by gradient descent. Specifically, a data generator is learned to synthesize OoD data by maximizing losses computed by different neural architectures, while the goal for architecture search is to find the optimal architecture parameters that minimize the synthetic OoD data losses. The data generator and the neural architecture are jointly optimized in an end-to-end manner, and the minimax training process effectively discovers robust architectures that generalize well for different distribution shifts. Extensive experimental results show that NAS-OoD achieves superior performance on various OoD generalization benchmarks with deep models having a much fewer number of parameters. In addition, on a real industry dataset, the proposed NAS-OoD method reduces the error rate by more than 70% compared with the state-of-the-art method, demonstrating the proposed method's practicality for real applications.

【14】 Sparse-MLP: A Fully-MLP Architecture with Conditional Computation 标题：稀疏-MLP：一种支持条件计算的全MLP体系结构链接：https://arxiv.org/abs/2109.02008

作者：Yuxuan Lou,Fuzhao Xue,Zangwei Zheng,Yang You 机构：National University of Singapore 摘要：专家混合（MoE）和稀疏条件计算已被证明是一种有效的架构，可将基于注意的模型扩展到更多参数，且计算成本相当。在本文中，我们提出了稀疏MLP，用稀疏MoE层扩展最近的MLP混频器模型，以实现更高效的计算架构。我们将MLP混频器模型中的密集MLP块子集替换为稀疏块。在每个稀疏块中，我们应用两个阶段的MoE层：一个是MLP专家沿图像块维度在通道内混合信息，一个是MLP专家沿通道维度在通道内混合信息。此外，为了减少路由的计算量和提高专家容量，我们在每个稀疏块中设计了重表示层。这些层通过两个简单但有效的线性变换来重新缩放图像表示。通过使用MoCo v3算法在ImageNet-1k上进行预训练，我们的模型在几个下游图像分类任务上的性能优于具有可比参数和较少计算成本的密集MLP模型。摘要：Mixture of Experts (MoE) with sparse conditional computation has been proved an effective architecture for scaling attention-based models to more parameters with comparable computation cost. In this paper, we propose Sparse-MLP, scaling the recent MLP-Mixer model with sparse MoE layers, to achieve a more computation-efficient architecture. We replace a subset of dense MLP blocks in the MLP-Mixer model with Sparse blocks. In each Sparse block, we apply two stages of MoE layers: one with MLP experts mixing information within channels along image patch dimension, one with MLP experts mixing information within patches along the channel dimension. Besides, to reduce computational cost in routing and improve experts capacity, we design Re-represent layers in each Sparse block. These layers are to re-scale image representations by two simple but effective linear transformations. By pre-training on ImageNet-1k with MoCo v3 algorithm, our models can outperform dense MLP models with comparable parameters and less computational cost on several downstream image classification tasks.

【15】 Deep Saliency Prior for Reducing Visual Distraction 标题：降低视觉分心的深度显著优先链接：https://arxiv.org/abs/2109.01980

作者：Kfir Aberman,Junfeng He,Yossi Gandelsman,Inbar Mosseri,David E. Jacobs,Kai Kohlhoff,Yael Pritch,Michael Rubinstein 机构：Google Research 备注：this https URL 摘要：仅使用经过训练的模型来预测人们看图像的位置，而无需额外的训练数据，我们就可以产生一系列强大的编辑效果，以减少图像中的分心。给定指定要编辑区域的图像和遮罩，我们通过最先进的显著性模型进行反向传播，以参数化可微编辑算子，从而减少遮罩区域内的显著性。我们演示了几个操作符，包括：重新着色操作符，它学习应用颜色变换，将干扰物伪装并混合到周围环境中；一个扭曲操作符，它扭曲不太突出的图像区域以覆盖干扰物，逐渐将对象折叠为自身并有效地移除它们（类似于修复的效果）；一种GAN算子，它使用语义优先，然后用看似合理、不太突出的备选方案完全替换图像区域。由此产生的效果与人类视觉系统的认知研究一致（例如，由于颜色不匹配是显著的，重新着色操作员学习将对象的颜色与其周围环境协调以降低其显著性），而且重要的是，所有这些都是通过预训练显著性模型的指导实现的，没有额外的监督。我们展示了各种自然图像的结果，并进行了感性研究，以评估和验证原始图像和我们编辑的结果之间观众眼睛注视的变化。摘要：Using only a model that was trained to predict where people look at images, and no additional training data, we can produce a range of powerful editing effects for reducing distraction in images. Given an image and a mask specifying the region to edit, we backpropagate through a state-of-the-art saliency model to parameterize a differentiable editing operator, such that the saliency within the masked region is reduced. We demonstrate several operators, including: a recoloring operator, which learns to apply a color transform that camouflages and blends distractors into their surroundings; a warping operator, which warps less salient image regions to cover distractors, gradually collapsing objects into themselves and effectively removing them (an effect akin to inpainting); a GAN operator, which uses a semantic prior to fully replace image regions with plausible, less salient alternatives. The resulting effects are consistent with cognitive research on the human visual system (e.g., since color mismatch is salient, the recoloring operator learns to harmonize objects' colors with their surrounding to reduce their saliency), and, importantly, are all achieved solely through the guidance of the pretrained saliency model, with no additional supervision. We present results on a variety of natural images and conduct a perceptual study to evaluate and validate the changes in viewers' eye-gaze between the original images and our edited results.

【16】 Barycenteric distribution alignment and manifold-restricted invertibility for domain generalization 标题：区域泛化的重心分布对齐和流形限制可逆性链接：https://arxiv.org/abs/2109.01902

作者：Boyang Lyu,Thuan Nguyen,Prakash Ishwar,Matthias Scheutz,Shuchin Aeron 摘要：对于域泛化（DG）问题，假设由一个公共表示函数和一个标记函数组成，我们指出了现有方法的一个缺点，即无法明确优化一个术语，出现在一个众所周知且被广泛采用的未知域风险上限中，这取决于要学习的表示。为此，我们首先推导出一个新的预测风险上界。我们表明，对要学习的表示施加一个温和的假设，即流形限制可逆性，就足以处理这个问题。此外，与现有方法不同，我们的新上界不需要损失函数的Lipschitz假设。此外，表示空间中的分布差异通过Wasserstein-2重心成本进行处理。在这种情况下，我们创造性地利用旧的和最近的传输不平等性，将各种最佳传输指标，特别是$L^1$距离（也称为总变化距离）和Wasserstein-2距离与Kullback-Liebler散度联系起来。这些分析和见解激发了DG的新表示学习成本，该成本额外平衡了三个相互竞争的目标：1）通过交叉熵最小化SEN域的分类错误，2）通过Wasserstein-2重心成本在表示空间中实现域不变性，以及3）促进非退化，通过两种机制之一的近似可逆表示，即基于自动编码器的重建损失或互信息损失。需要注意的是，所提出的算法完全绕过了当前许多领域泛化方法中典型的对抗性训练机制。在几个标准数据集上的仿真结果表明，与几个著名的DG算法相比，该算法具有更好的性能。摘要：For the Domain Generalization (DG) problem where the hypotheses are composed of a common representation function followed by a labeling function, we point out a shortcoming in existing approaches that fail to explicitly optimize for a term, appearing in a well-known and widely adopted upper bound to the risk on the unseen domain, that is dependent on the representation to be learned. To this end, we first derive a novel upper bound to the prediction risk. We show that imposing a mild assumption on the representation to be learned, namely manifold restricted invertibility, is sufficient to deal with this issue. Further, unlike existing approaches, our novel upper bound doesn't require the assumption of Lipschitzness of the loss function. In addition, the distributional discrepancy in the representation space is handled via the Wasserstein-2 barycenter cost. In this context, we creatively leverage old and recent transport inequalities, which link various optimal transport metrics, in particular the $L^1$ distance (also known as the total variation distance) and the Wasserstein-2 distances, with the Kullback-Liebler divergence. These analyses and insights motivate a new representation learning cost for DG that additively balances three competing objectives: 1) minimizing classification error across seen domains via cross-entropy, 2) enforcing domain-invariance in the representation space via the Wasserstein-2 barycenter cost, and 3) promoting non-degenerate, nearly-invertible representation via one of two mechanisms, viz., an autoencoder-based reconstruction loss or a mutual information loss. It is to be noted that the proposed algorithms completely bypass the use of any adversarial training mechanism that is typical of many current domain generalization approaches. Simulation results on several standard datasets demonstrate superior performance compared to several well-known DG algorithms.

【17】 RAMA: A Rapid Multicut Algorithm on GPU 标题：RAMA：一种基于GPU的快速组播算法链接：https://arxiv.org/abs/2109.01838

作者：Ahmed Abbas,Paul Swoboda 机构：Max Planck Institute for Informatics, Saarland Informatics Campus 摘要：我们针对multicut（又称相关聚类）问题提出了一种高度并行的原始-对偶算法，该问题是一个在机器学习和计算机视觉中广泛应用的经典图聚类问题。我们的算法由三个递归执行的步骤组成：（1）找到对应于潜在多切口松弛的违反不等式的冲突循环，（2）在边和循环之间执行消息传递，以优化来自发现的违反循环的拉格朗日松弛，从而产生降低的成本；（3）通过矩阵乘法以高降低的成本收缩边。我们的算法产生原始解和估计到最优距离的对偶下界。我们在GPU上实现了我们的算法，与在CPU上运行的传统串行算法相比，在不牺牲解决方案质量的情况下，执行速度提高了一到两个数量级。我们可以在几秒钟内用$\mathcal{O}（10^8）$变量解决非常大规模的基准测试问题，并且有很小的原始-对偶间隙。我们在https://github.com/pawelswoboda/RAMA. 摘要：We propose a highly parallel primal-dual algorithm for the multicut (a.k.a. correlation clustering) problem, a classical graph clustering problem widely used in machine learning and computer vision. Our algorithm consists of three steps executed recursively: (1) Finding conflicted cycles that correspond to violated inequalities of the underlying multicut relaxation, (2) Performing message passing between the edges and cycles to optimize the Lagrange relaxation coming from the found violated cycles producing reduced costs and (3) Contracting edges with high reduced costs through matrix-matrix multiplications. Our algorithm produces primal solutions and dual lower bounds that estimate the distance to optimum. We implement our algorithm on GPUs and show resulting one to two order-of-magnitudes improvements in execution speed without sacrificing solution quality compared to traditional serial algorithms that run on CPUs. We can solve very large scale benchmark problems with up to $\mathcal{O}(10^8)$ variables in a few seconds with small primal-dual gaps. We make our code available at https://github.com/pawelswoboda/RAMA.

【18】 On the Complexity of Computing Markov Perfect Equilibrium in General-Sum Stochastic Games 标题：关于一般和随机对策中马尔可夫完全均衡计算的复杂性链接：https://arxiv.org/abs/2109.01795

作者：Xiaotie Deng,Yuhao Li,David Henry Mguni,Jun Wang,Yaodong Yang 机构：Center on Frontiers of Computing Studies, Peking University, Huawei R&D UK, University College London, King’s College London 摘要：类似于马尔可夫决策过程在强化学习中的作用，随机博弈（SGS）为多Agent强化学习（MARL）和序贯Agent交互研究奠定了基础。在本文中，我们推导出在指数精度内计算有限状态折扣随机对策中的近似马尔可夫完美均衡（MPE）是完全的。我们在策略空间中采用一个具有多项式有界描述的函数，将MPE计算转化为一个不动点问题，即使随机博弈可能要求每个代理在状态数上具有指数数量的纯策略。完备性结果是将定点问题简化为{\sc行的末尾}。我们的结果表明，在SGs中找到MPE不太可能是\textbf{NP}难的，除非\textbf{NP}=\textbf{co-NP}。我们的工作为MARL研究提供了信心，以研究一般和SGs上的MPE计算，并开发出目前在零和SGs上富有成效的算法。摘要：Similar to the role of Markov decision processes in reinforcement learning, Stochastic Games (SGs) lay the foundation for the study of multi-agent reinforcement learning (MARL) and sequential agent interactions. In this paper, we derive that computing an approximate Markov Perfect Equilibrium (MPE) in a finite-state discounted Stochastic Game within the exponential precision is \textbf{PPAD}-complete. We adopt a function with a polynomially bounded description in the strategy space to convert the MPE computation to a fixed-point problem, even though the stochastic game may demand an exponential number of pure strategies, in the number of states, for each agent. The completeness result follows the reduction of the fixed-point problem to {\sc End of the Line}. Our results indicate that finding an MPE in SGs is highly unlikely to be \textbf{NP}-hard unless \textbf{NP}=\textbf{co-NP}. Our work offers confidence for MARL research to study MPE computation on general-sum SGs and to develop fruitful algorithms as currently on zero-sum SGs.

【19】 MLCTR: A Fast Scalable Coupled Tensor Completion Based on Multi-Layer Non-Linear Matrix Factorization 标题：MLCTR：一种基于多层非线性矩阵分解的快速可伸缩耦合张量补全链接：https://arxiv.org/abs/2109.01773

作者：Ajim Uddin,Dan Zhou,Xinyuan Tao,Chia-Ching Chou,Dantong Yu 机构：New Jersey Institute of Technology, edu†Central Michigan University 摘要：公司盈利预测在投资决策、股息预期和股价中起着至关重要的作用。它通常涉及多个具有非线性多向关系、时空结构和不同稀疏程度的张量兼容数据集。目前的非线性张量补全算法倾向于学习含噪嵌入和过度拟合。本文着重于张量完备问题的嵌入学习方面，提出了一种新的用于张量因子分解和完备的多层神经网络结构（MLCTR）。该网络体系结构具有多个优点：一系列低秩矩阵分解（MF）构建块用于最小化过度拟合，每层中的交错传递函数用于非线性，旁路连接用于减少梯度减小问题并增加神经网络的深度。此外，该模型还采用了基于随机梯度下降（SGD）的优化方法，以便在训练中快速收敛。我们的算法对于在EPS数据中输入缺失值是非常有效的。实验证实，我们将非线性纳入因子矩阵的策略在嵌入学习和端到端张量模型方面表现出令人印象深刻的性能，并且在从因子矩阵重构张量的阶段优于非线性方法。摘要：Firms earning prediction plays a vital role in investment decisions, dividends expectation, and share price. It often involves multiple tensor-compatible datasets with non-linear multi-way relationships, spatiotemporal structures, and different levels of sparsity. Current non-linear tensor completion algorithms tend to learn noisy embedding and incur overfitting. This paper focuses on the embedding learning aspect of the tensor completion problem and proposes a new multi-layer neural network architecture for tensor factorization and completion (MLCTR). The network architecture entails multiple advantages: a series of low-rank matrix factorizations (MF) building blocks to minimize overfitting, interleaved transfer functions in each layer for non-linearity, and by-pass connections to reduce the gradient diminishing problem and increase the depths of neural networks. Furthermore, the model employs Stochastic Gradient Descent(SGD) based optimization for fast convergence in training. Our algorithm is highly efficient for imputing missing values in the EPS data. Experiments confirm that our strategy of incorporating non-linearity in factor matrices demonstrates impressive performance in embedding learning and end-to-end tensor models, and outperforms approaches with non-linearity in the phase of reconstructing tensors from factor matrices.

【20】 Assessing the Knowledge State of Online Students -- New Data, New Approaches, Improved Accuracy 标题：网络学生知识状况测评--新数据、新方法、高准确度链接：https://arxiv.org/abs/2109.01753

作者：Robin Schmucker,Jingbo Wang,Shijia Hu,Tom M. Mitchell 机构：Carnegie Mellon University 摘要：我们考虑的问题，评估变化的知识状态的个人学生，因为他们通过网络课程。这个学生表现（SP）建模问题，也称为知识追踪，是构建自适应在线教学系统的关键步骤。具体来说，我们研究如何利用各种类型和大量的学生日志数据来训练准确的机器学习模型，预测未来学生的知识状态。这项研究是第一次使用最近从四个不同的智能教学系统中获得的四个非常大的数据集。我们的研究结果包括一种新的机器学习方法，该方法定义了SP建模的最新技术状态，并从几个方面改进了早期方法：首先，我们通过引入新的特征来提高准确性，这些特征可以很容易地从传统的问题响应日志中计算出来（例如，学生最新答案中的模式）。其次，我们利用学生历史的特点，这些特点超越了问题-反应对（例如，学生观看或跳过了哪些视频片段）以及有关课程中先决条件结构的信息。第三，我们针对课程的不同方面（例如，专门研究学生历史的早期和后期部分）训练多个专业建模模型，然后结合这些专业模型创建学生知识的群体预测。综上所述，这些创新在这四个数据集中的平均AUC得分为0.807，而之前的最佳逻辑回归方法得分为0.766，也优于最先进的深度神经网络方法。重要的是，我们观察到，在每个数据集中，我们的三项方法创新中的每一项都有持续的改进，这表明我们的方法具有普遍的实用性，并且可能会对其他在线教学系统产生改进。摘要：We consider the problem of assessing the changing knowledge state of individual students as they go through online courses. This student performance (SP) modeling problem, also known as knowledge tracing, is a critical step for building adaptive online teaching systems. Specifically, we conduct a study of how to utilize various types and large amounts of students log data to train accurate machine learning models that predict the knowledge state of future students. This study is the first to use four very large datasets made available recently from four distinct intelligent tutoring systems. Our results include a new machine learning approach that defines a new state of the art for SP modeling, improving over earlier methods in several ways: First, we achieve improved accuracy by introducing new features that can be easily computed from conventional question-response logs (e.g., the pattern in the student's most recent answers). Second, we take advantage of features of the student history that go beyond question-response pairs (e.g., which video segments the student watched, or skipped) as well as information about prerequisite structure in the curriculum. Third, we train multiple specialized modeling models for different aspects of the curriculum (e.g., specializing in early versus later segments of the student history), then combine these specialized models to create a group prediction of student knowledge. Taken together, these innovations yield an average AUC score across these four datasets of 0.807 compared to the previous best logistic regression approach score of 0.766, and also outperforming state-of-the-art deep neural net approaches. Importantly, we observe consistent improvements from each of our three methodological innovations, in each dataset, suggesting that our methods are of general utility and likely to produce improvements for other online tutoring systems as well.

【21】 CodeNeRF: Disentangled Neural Radiance Fields for Object Categories 标题：CodeNeRF：用于对象类别的解缠神经辐射场链接：https://arxiv.org/abs/2109.01750

作者：Wonbong Jang,Lourdes Agapito 备注：10 pages, 15 figures, ICCV 2021 摘要：CodeNeRF是一种隐式3D神经表示法，它学习对象形状和纹理在一个类别中的变化，并且可以从一组姿势图像中进行训练，以合成看不见对象的新视图。与场景特定的原始NeRF不同，CodeNeRF通过学习单独的嵌入来分离形状和纹理。在测试时，给定一个看不见对象的单一未使用图像，CodeNeRF通过优化联合估计相机视点、形状和外观代码。看不见的对象可以从单个图像重建，然后从新的视点渲染，或者通过改变潜在代码编辑它们的形状和纹理。我们在SRN基准上进行了实验，结果表明CodeNeRF能够很好地推广到看不见的对象，并与测试时需要已知相机姿态的方法达到了同等的性能。我们在真实图像上的研究结果表明，CodeNeRF可以将sim和real之间的差距桥接起来。项目页面：\url{https://github.com/wayne1123/code-nerf} 摘要：CodeNeRF is an implicit 3D neural representation that learns the variation of object shapes and textures across a category and can be trained, from a set of posed images, to synthesize novel views of unseen objects. Unlike the original NeRF, which is scene specific, CodeNeRF learns to disentangle shape and texture by learning separate embeddings. At test time, given a single unposed image of an unseen object, CodeNeRF jointly estimates camera viewpoint, and shape and appearance codes via optimization. Unseen objects can be reconstructed from a single image, and then rendered from new viewpoints or their shape and texture edited by varying the latent codes. We conduct experiments on the SRN benchmark, which show that CodeNeRF generalises well to unseen objects and achieves on-par performance with methods that require known camera pose at test time. Our results on real-world images demonstrate that CodeNeRF can bridge the sim-to-real gap. Project page: \url{https://github.com/wayne1123/code-nerf}

【22】 Cohort Characteristics and Factors Associated with Cannabis Use among Adolescents in Canada Using Pattern Discovery and Disentanglement Method 标题：用模式发现和解缠法研究加拿大青少年使用大麻的队列特征和相关因素链接：https://arxiv.org/abs/2109.01739

作者：Peiyuan Zhou,Andrew K. C. Wong,Yang Yang,Scott T. Leatherdale,Kate Battista,Zahid A. Butt,George Michalopoulos,Helen Chen 机构：Authors:, System Design Engineering, University of Waterloo, Waterloo, ON, Canada, School of Public Health and Health Systems, University of Waterloo, Waterloo, ON, Canada 备注：21 pages, 3 figures, 4 tables 摘要：COMPASS是一项纵向、前瞻性队列研究，每年收集加拿大各地高中学生的数据。我们的目的是在加拿大青少年中发现与大麻使用相关的行为因素的显著频繁/罕见关联。我们使用COMPASS数据集的一个子集，该数据集包含18761份9至12年级学生的记录，其中31项特征（属性）涉及各种特征，从生活习惯到学习成绩。然后，我们使用我们开发的模式发现和解纠缠（PDD）算法来检测数据集中的强关联和罕见关联（但具有统计意义）。PDD使用从分离的统计空间（称为重新投影调整的标准化剩余向量空间，表示为RARV）导出的标准。它优于文献中报道的使用其他标准（即支持度和信心）的方法。关联结果表明，PDD可以发现：i）集群中的一组较小的简洁显著关联；ii）人口健康相关研究支持的频繁、罕见但重要的模式；iii）极端不平衡群体数据集的模式（多数群体：少数群体=88.3%：11.7%）。摘要：COMPASS is a longitudinal, prospective cohort study collecting data annually from students attending high school in jurisdictions across Canada. We aimed to discover significant frequent/rare associations of behavioral factors among Canadian adolescents related to cannabis use. We use a subset of COMPASS dataset which contains 18,761 records of students in grades 9 to 12 with 31 selected features (attributes) involving various characteristics, from living habits to academic performance. We then used the Pattern Discovery and Disentanglement (PDD) algorithm that we have developed to detect strong and rare (yet statistically significant) associations from the dataset. PDD used the criteria derived from disentangled statistical spaces (known as Re-projected Adjusted-Standardized Residual Vector Spaces, notated as RARV). It outperformed methods using other criteria (i.e. support and confidence) popular as reported in the literature. Association results showed that PDD can discover: i) a smaller set of succinct significant associations in clusters; ii) frequent and rare, yet significant, patterns supported by population health relevant study; iii) patterns from a dataset with extremely imbalanced groups (majority class: minority class = 88.3%: 11.7%).

【23】 Nonasymptotic one-and two-sample tests in high dimension with unknown covariance structure 标题：协方差结构未知的高维非渐近单样本和双样本检验链接：https://arxiv.org/abs/2109.01730

作者：Gilles Blanchard,Jean-Baptiste Fermanian 机构： Institut de Mathématiques, CNRS, Inria, Université Paris-Saclay, École Normale Supérieure de Rennes 摘要：设$\mathbf{X}=（X_i）{1\leq i\leq n}$是$\mathbb{R}^d$中平方可积变量的i.i.d.样本，公共期望$\mu$和协方差矩阵$\Sigma$，两者都未知。我们考虑测试的问题，如果$MU$是$ETA $ -接近于零，即$\Mu\\Leq\ETA$反对$\亩\ Geq（\η+δ）$；我们还解决了更一般的两样本均值贴近度检验问题。本文的目的是获得最小分离距离$\delta$的非交感上下界，这样我们就可以在给定的水平上控制I型和II型误差。主要的技术工具是集中不等式，首先是用于测试统计的$\\\mu\\^2$的合适估计器，其次是用于估计进入所述测试统计分位数的$\Sigma$的运算符和Frobenius范数。这些性质是针对高斯分布和有界分布得到的。特别注意分布的伪维$d_*$中的依赖关系，定义为$d_*:=\\\\Sigma\\\u124; 2^2/\\\\Sigma\\u124; infty^2$。特别是，对于“埃塔=0 0美元”这一美元，最低分离距离是“{\eta=0 0 0美元”，最低分离距离是“{\eta{1}{4}}{4}{4}{4}{{1}{1}{1}{4}}{4}{4}{4}}{4}}{1}}}4}}}1}4}1}1}4}\Sigma{1{1 124 \\124 \124 \124 \124 \124 124 \\\\\\Sigma \\Sigma \\Sigma \ \\Sigma \ \ \\Sigma \ \ \ \ \ \ \ 124 124 124 124 124。这概括了Baraud（2002）特别指出的一种现象。摘要：Let $\mathbf{X} = (X_i)_{1\leq i \leq n}$ be an i.i.d. sample of square-integrable variables in $\mathbb{R}^d$, with common expectation $\mu$ and covariance matrix $\Sigma$, both unknown. We consider the problem of testing if $\mu$ is $\eta$-close to zero, i.e. $\|\mu\| \leq \eta $ against $\|\mu\| \geq (\eta + \delta)$; we also tackle the more general two-sample mean closeness testing problem. The aim of this paper is to obtain nonasymptotic upper and lower bounds on the minimal separation distance $\delta$ such that we can control both the Type I and Type II errors at a given level. The main technical tools are concentration inequalities, first for a suitable estimator of $\|\mu\|^2$ used a test statistic, and secondly for estimating the operator and Frobenius norms of $\Sigma$ coming into the quantiles of said test statistic. These properties are obtained for Gaussian and bounded distributions. A particular attention is given to the dependence in the pseudo-dimension $d_*$ of the distribution, defined as $d_* := \|\Sigma\|_2^2/\|\Sigma\|_\infty^2$. In particular, for $\eta=0$, the minimum separation distance is ${\Theta}(d_*^{\frac{1}{4}}\sqrt{\|\Sigma\|_\infty/n})$, in contrast with the minimax estimation distance for $\mu$, which is ${\Theta}(d_e^{\frac{1}{2}}\sqrt{\|\Sigma\|_\infty/n})$ (where $d_e:=\|\Sigma\|_1/\|\Sigma\|_\infty$). This generalizes a phenomenon spelled out in particular by Baraud (2002).

【24】 Thompson Sampling for Bandits with Clustered Arms 标题：集束臂土匪的Thompson抽样链接：https://arxiv.org/abs/2109.01656

作者：Emil Carlsson,Devdatt Dubhashi,Fredrik D. Johansson 机构：Department of Computer Science and Engineering, Chalmers University 备注：Paper accepted to IJCAI-2021. The supplementary material is not part of the IJCAI-21 Proceedings 摘要：我们提出了一种基于多级汤普森抽样方案的算法，用于在武器聚集的环境下，随机多武装匪徒及其具有线性期望报酬的上下文变量。我们从理论和经验上证明，与使用标准汤普森抽样相比，利用给定的簇结构可以显著提高遗憾和计算成本。在随机多臂bandit的情况下，我们给出了期望累积后悔的上界，显示了它如何依赖于聚类的质量。最后，我们进行了一次实证评估，结果表明我们的算法与之前提出的针对武器集群的强盗的算法相比表现良好。摘要：We propose algorithms based on a multi-level Thompson sampling scheme, for the stochastic multi-armed bandit and its contextual variant with linear expected rewards, in the setting where arms are clustered. We show, both theoretically and empirically, how exploiting a given cluster structure can significantly improve the regret and computational cost compared to using standard Thompson sampling. In the case of the stochastic multi-armed bandit we give upper bounds on the expected cumulative regret showing how it depends on the quality of the clustering. Finally, we perform an empirical evaluation showing that our algorithms perform well compared to previously proposed algorithms for bandits with clustered arms.

【25】 Estimating Leaf Water Content using Remotely Sensed Hyperspectral Data 标题：利用遥感高光谱数据估算叶片含水量链接：https://arxiv.org/abs/2109.02250

作者：Vishal Vinod,Rahul Raj,Rohit Pingale,Adinarayana Jagarlapudi 机构：Indian Institute of Science Bangalore, Indian Institute of Technology, Bombay 备注：ICCV 2021 CVPPA Workshop Extended Abstract 摘要：植物水分胁迫可能是由于根系/土壤水分供应有限或蒸腾作用增加所致。这些因素对植物生理和光合能力产生不利影响，已证明对生长和产量都有抑制作用[18]。植物水分胁迫状态的早期识别可以采取适当的纠正措施，以获得预期的作物产量。此外，通过精确农业方法提高作物产量是气候政策和联合国可持续发展目标的关键组成部分[1]。叶片含水量（LWC）是一种可用于估算含水量和识别受胁迫植物的指标。作物生长早期的LWC是植物生产力和产量的重要指标。水分胁迫的影响可能是瞬时的[15]，影响气体交换或长期的，显著降低[9,18,22]。因此，有必要在生长的早期阶段确定潜在的植物水分胁迫[15]，以引入纠正灌溉并缓解胁迫。LWC也有助于通过测量LWC的稳定性（即使在人工诱导的水分胁迫下）来确定耐水分胁迫和盐分的植物基因型[18,25]。此类实验通常采用破坏性程序来获得LWC，这既耗时又费力。因此，本研究开发了一种从无人机高光谱数据估算LWC的无损方法。摘要：Plant water stress may occur due to the limited availability of water to the roots/soil or due to increased transpiration. These factors adversely affect plant physiology and photosynthetic ability to the extent that it has been shown to have inhibitory effects in both growth and yield [18]. Early identification of plant water stress status enables suitable corrective measures to be applied to obtain the expected crop yield. Further, improving crop yield through precision agriculture methods is a key component of climate policy and the UN sustainable development goals [1]. Leaf water content (LWC) is a measure that can be used to estimate water content and identify stressed plants. LWC during the early crop growth stages is an important indicator of plant productivity and yield. The effect of water stress can be instantaneous [15], affecting gaseous exchange or long-term, significantly reducing [9, 18, 22]. It is thus necessary to identify potential plant water stress during the early stages of growth [15] to introduce corrective irrigation and alleviate stress. LWC is also useful for identifying plant genotypes that are tolerant to water stress and salinity by measuring the stability of LWC even under artificially induced water stress [18, 25]. Such experiments generally employ destructive procedures to obtain the LWC, which is time-consuming and labor intensive. Accordingly, this research has developed a non-destructive method to estimate LWC from UAV-based hyperspectral data.

【26】 Scalable Feature Selection for (Multitask) Gradient Boosted Trees 标题：(多任务)梯度增强树的可伸缩特征选择链接：https://arxiv.org/abs/2109.01965

作者：Cuize Han,Nikhil Rao,Daria Sorokina,Karthik Subbian 机构：Amazon, Palo Alto, CA 备注：None 摘要：梯度增强决策树（GBDT）广泛用于建立搜索和推荐中的排序和关联模型。延迟和可解释性等考虑因素要求使用尽可能少的功能来训练这些模型。GBDT模型中的特征选择通常涉及按重要性对特征进行启发式排序，并选择前几个特征，或执行完全向后的特征消除例程。以前提出的动态特征选择方法根据特征的数量进行次优缩放，这在高维环境中可能会令人望而生畏。我们通过一种新的组测试程序为GBDT开发了一种可扩展的前向特征选择变体，该程序在高维环境下运行良好，并且具有良好的理论性能和计算保证。我们通过在公共和专有数据集上的大量实验表明，所提出的方法在训练时间上提供了显著的加速，同时在模型性能指标方面与现有的GBDT方法一样具有竞争力。我们还将该方法扩展到多任务设置，允许实践者跨任务选择公共特性，以及选择特定于任务的特性。摘要：Gradient Boosted Decision Trees (GBDTs) are widely used for building ranking and relevance models in search and recommendation. Considerations such as latency and interpretability dictate the use of as few features as possible to train these models. Feature selection in GBDT models typically involves heuristically ranking the features by importance and selecting the top few, or by performing a full backward feature elimination routine. On-the-fly feature selection methods proposed previously scale suboptimally with the number of features, which can be daunting in high dimensional settings. We develop a scalable forward feature selection variant for GBDT, via a novel group testing procedure that works well in high dimensions, and enjoys favorable theoretical performance and computational guarantees. We show via extensive experiments on both public and proprietary datasets that the proposed method offers significant speedups in training time, while being as competitive as existing GBDT methods in terms of model performance metrics. We also extend the method to the multitask setting, allowing the practitioner to select common features across tasks, as well as selecting task-specific features.

【27】 High-quality Thermal Gibbs Sampling with Quantum Annealing Hardware 标题：用量子退火硬件实现高质量的热吉布斯采样链接：https://arxiv.org/abs/2109.01690

作者：Jon Nelson,Marc Vuffray,Andrey Y. Lokhov,Tameem Albash,Carleton Coffrin 机构：Advanced Network Science Initiative, Los Alamos National Laboratory, Los Alamos, NM, USA, Theoretical Division, Los Alamos National Laboratory, Los Alamos, NM, USA, Department of Electrical and Computer Engineering 摘要：量子退火（QA）最初是为了加速组合优化任务的求解，这些任务具有伊辛模型的自然编码。然而，最近在QA硬件平台上进行的实验表明，在与弱交互相对应的操作状态下，QA硬件在特定于硬件的有效温度下的行为类似于噪声Gibbs采样器。这项工作基于这些见解，确定了一类小型硬件本机伊辛模型，这些模型对噪声影响具有鲁棒性，并提出了一种在QA硬件上执行这些模型的新程序，以最大限度地提高Gibbs采样性能。实验结果表明，该协议可以从硬件特定的有效温度获得高质量的Gibbs样本，并且QA退火时间可以用来调整输出分布的有效温度。这项工作中提出的程序提供了一种使用QA硬件进行Ising模型采样的新方法，为机器学习和物理模拟的应用提供了潜在的新机会。摘要：Quantum Annealing (QA) was originally intended for accelerating the solution of combinatorial optimization tasks that have natural encodings as Ising models. However, recent experiments on QA hardware platforms have demonstrated that, in the operating regime corresponding to weak interactions, the QA hardware behaves like a noisy Gibbs sampler at a hardware-specific effective temperature. This work builds on those insights and identifies a class of small hardware-native Ising models that are robust to noise effects and proposes a novel procedure for executing these models on QA hardware to maximize Gibbs sampling performance. Experimental results indicate that the proposed protocol results in high-quality Gibbs samples from a hardware-specific effective temperature and that the QA annealing time can be used to adjust the effective temperature of the output distribution. The procedure proposed in this work provides a new approach to using QA hardware for Ising model sampling presenting potential new opportunities for applications in machine learning and physics simulation.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-09-07，如有侵权请联系 cloudcommunity@tencent.com 删除

linux