机器学习学术速递[6.25]

公众号-arXiv每日学术速递

发布于 2021-07-02 17:45:08

1.7K0

发布于 2021-07-02 17:45:08

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.LG 方向，今日共计128篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 Fea2Fea: Exploring Structural Feature Correlations via Graph Neural Networks 标题：Fea2Fea：基于图神经网络的结构特征相关性研究

作者：Jiaqing Xie,Rex Ying 机构： Exploring Structural FeatureCorrelations via Graph Neural NetworksJiaqing Xie 1 and Rex Ying 2 1 University of Edinburgh s 200 1696, uk 2 Stanford University rexying 备注：Submitted to ECML-PKDD 2021 Graph Embedding and Mining(GEM) workshop 链接：https://arxiv.org/abs/2106.13061 摘要：结构特征是图形数据集的重要特征。然而，虽然已有一些基于协方差的特征相关性分析方法，但对于基于图神经网络模型的图的结构特征相关性研究还没有相关的研究。本文在低维空间引入图特征对特征（Fea2Fea）预测管道，探讨了基于图神经网络的结构特征关联的一些初步结果。结果表明，某些构造特征之间存在着很高的相关性。将冗余特征与初始节点特征相结合，通过图神经网络过滤，提高了在某些图数据集中的分类精度。我们比较了两种连接方法在特征间嵌入连接上的差异，结果表明最简单的方法是最好的。对合成几何图进行了推广，证明了两个结构特征之间预测困难的结果。摘要：Structural features are important features in graph datasets. However, although there are some correlation analysis of features based on covariance, there is no relevant research on exploring structural feature correlation on graphs with graph neural network based models. In this paper, we introduce graph feature to feature (Fea2Fea) prediction pipelines in a low dimensional space to explore some preliminary results on structural feature correlation, which is based on graph neural network. The results show that there exists high correlation between some of the structural features. A redundant feature combination with initial node features, which is filtered by graph neural network has improved its classification accuracy in some graph datasets. We compare the difference between concatenation methods on connecting embeddings between features and show that the simplest is the best. We generalize on the synthetic geometric graphs and certify the results on prediction difficulty between two structural features.

【2】 Spatial-Temporal Graph ODE Networks for Traffic Flow Forecasting 标题：用于交通流预测的时空图ODE网络

作者：Zheng Fang,Qingqing Long,Guojie Song,Kunqing Xie 机构：Key Laboratory of Machine Perception (Ministry of, Education), Peking University, Alibaba Group 链接：https://arxiv.org/abs/2106.12931 摘要：时空预测在广泛的应用中引起了人们的极大关注，交通流预测就是一个典型的例子。交通流的复杂和长距离的时空相关性给它带来了最棘手的挑战。现有的工作通常利用浅图卷积网络（GNNs）和时态提取模块分别对空间和时态依赖关系进行建模。然而，由于以下原因，这些模型的表达能力受到限制：（1）浅层GNNs不能捕捉长距离的空间相关性；（2）只考虑空间连接而忽略大量的语义连接，这对于全面理解交通网络具有重要意义。为此，我们提出了时空图常微分方程网络（STGODE）。具体来说，我们通过一个基于张量的常微分方程（ODE）来捕捉时空动态，从而可以构建更深层次的网络，同时利用时空特征。为了更全面地理解网络，我们的模型考虑了语义邻接矩阵，并采用了设计良好的时态拨号卷积结构来捕获长期的时态依赖。我们在多个真实世界的流量数据集上评估了我们的模型，并且在最先进的基线上实现了优异的性能。摘要：Spatial-temporal forecasting has attracted tremendous attention in a wide range of applications, and traffic flow prediction is a canonical and typical example. The complex and long-range spatial-temporal correlations of traffic flow bring it to a most intractable challenge. Existing works typically utilize shallow graph convolution networks (GNNs) and temporal extracting modules to model spatial and temporal dependencies respectively. However, the representation ability of such models is limited due to: (1) shallow GNNs are incapable to capture long-range spatial correlations, (2) only spatial connections are considered and a mass of semantic connections are ignored, which are of great importance for a comprehensive understanding of traffic networks. To this end, we propose Spatial-Temporal Graph Ordinary Differential Equation Networks (STGODE). Specifically, we capture spatial-temporal dynamics through a tensor-based ordinary differential equation (ODE), as a result, deeper networks can be constructed and spatial-temporal features are utilized synchronously. To understand the network more comprehensively, semantical adjacency matrix is considered in our model, and a well-design temporal dialated convolution structure is used to capture long term temporal dependencies. We evaluate our model on multiple real-world traffic datasets and superior performance is achieved over state-of-the-art baselines.

【3】 Learnt Sparsification for Interpretable Graph Neural Networks 标题：可解释图神经网络的学习稀疏化

作者：Mandeep Rathee,Zijian Zhang,Thorben Funke,Megha Khosla,Avishek Anand 机构：Anand, L,S Research Center, Hannover, Lower Saxony Germany 备注：17 pages, 5 figures, 2 tables 链接：https://arxiv.org/abs/2106.12920 摘要：图神经网络（GNNs）在需要关系建模的各种任务和领域中取得了巨大的成功。GNNs使用图形结构作为归纳偏差来聚合节点特征，从而产生灵活而强大的模型。然而，GNNs仍然很难解释，因为节点特征和图结构之间的相互作用只是隐含的。在本文中，我们提出了一种新的方法，称为Kedge显式稀疏的基础图删除不必要的邻居。我们的关键思想是基于一个易于处理的稀疏化方法，使用硬库马拉斯瓦米分布，可以用于共轭任何GNN模型。Kedge以模块化的方式学习边缘掩模，并使用任何GNN进行训练，允许以端到端的方式进行基于梯度的优化。我们通过大量的实验证明，我们的Kedge模型可以修剪大量的边缘，而对测试精度的影响很小。具体地说，在PubMed数据集中，Kedge学习了80%以上的边，而准确度仅下降了2%，这表明与节点特征相比，图结构的贡献很小。最后，我们还表明Kedge通过增加GNN层来保持良好的任务性能，从而有效地对抗了深度GNN中的过度平滑现象。摘要：Graph neural networks (GNNs) have achieved great success on various tasks and fields that require relational modeling. GNNs aggregate node features using the graph structure as inductive biases resulting in flexible and powerful models. However, GNNs remain hard to interpret as the interplay between node features and graph structure is only implicitly learned. In this paper, we propose a novel method called Kedge for explicitly sparsifying the underlying graph by removing unnecessary neighbors. Our key idea is based on a tractable method for sparsification using the Hard Kumaraswamy distribution that can be used in conjugation with any GNN model. Kedge learns edge masks in a modular fashion trained with any GNN allowing for gradient based optimization in an end-to-end fashion. We demonstrate through extensive experiments that our model Kedge can prune a large proportion of the edges with only a minor effect on the test accuracy. Specifically, in the PubMed dataset, Kedge learns to drop more than 80% of the edges with an accuracy drop of merely 2% showing that graph structure has only a small contribution in comparison to node features. Finally, we also show that Kedge effectively counters the over-smoothing phenomena in deep GNNs by maintaining good task performance with increasing GNN layers.

【4】 Visualizing Graph Neural Networks with CorGIE: Corresponding a Graph to Its Embedding 标题：用Corgie可视化图神经网络：图与其嵌入的对应关系

作者：Zipeng Liu,Yang Wang,Jürgen Bernard,Tamara Munzner 链接：https://arxiv.org/abs/2106.12839 摘要：图神经网络（GNNs）是一类功能强大的机器学习工具，它通过建立节点关系模型来预测节点或链路。GNN开发人员依赖预测的定量度量来评估GNN，但与许多其他神经网络类似，GNN是否真的像预期的那样学习了图的特征，他们很难理解。我们提出了一种将输入图与其节点嵌入（又名潜在空间）对应的方法，该节点嵌入是GNNs的一个公共组件，稍后将用于预测。我们对数据和任务进行抽象，并开发一个名为CorGIE的交互式多视图接口来实例化抽象。作为CorGIE的关键功能，我们提出了K-hop图布局来表示hops中的拓扑邻居及其聚类结构。为了评估CorGIE的功能和可用性，我们介绍了如何在两种使用场景中使用CorGIE，并与两位GNN专家进行了案例研究。摘要：Graph neural networks (GNNs) are a class of powerful machine learning tools that model node relations for making predictions of nodes or links. GNN developers rely on quantitative metrics of the predictions to evaluate a GNN, but similar to many other neural networks, it is difficult for them to understand if the GNN truly learns characteristics of a graph as expected. We propose an approach to corresponding an input graph to its node embedding (aka latent space), a common component of GNNs that is later used for prediction. We abstract the data and tasks, and develop an interactive multi-view interface called CorGIE to instantiate the abstraction. As the key function in CorGIE, we propose the K-hop graph layout to show topological neighbors in hops and their clustering structure. To evaluate the functionality and usability of CorGIE, we present how to use CorGIE in two usage scenarios, and conduct a case study with two GNN experts.

【5】 Simple Truncated SVD based Model for Node Classification on Heterophilic Graphs 标题：基于简单截断奇异值分解的异嗜图节点分类模型

作者：Vijay Lingam,Rahul Ragesh,Arun Iyer,Sundararajan Sellamanickam 机构：Microsoft Research India, Bengaluru, India 备注：Accepted at Deep Learning on Graphs: Method and Applications (DLG-KDD 2021) 链接：https://arxiv.org/abs/2106.12807 摘要：图神经网络（GNNs）在对节点标号具有强同态性的图上表现出了优异的性能，即连通节点具有相同的标号。然而，它们在异亲图上表现较差。最近的方法通常修改聚合方案，设计自适应图滤波器等来解决这一限制。尽管如此，异亲图的性能仍然很差。我们提出一个简单的替代方法，利用截断奇异值分解（TSVD）的拓扑结构和节点特征。我们的方法在异亲图上的性能比最新的方法提高了约30%。这项工作是对不同于基于聚合的方法的方法的早期研究。我们的实验结果表明，这可能是重要的探索其他替代聚集方法的异嗜性设置。摘要：Graph Neural Networks (GNNs) have shown excellent performance on graphs that exhibit strong homophily with respect to the node labels i.e. connected nodes have same labels. However, they perform poorly on heterophilic graphs. Recent approaches have typically modified aggregation schemes, designed adaptive graph filters, etc. to address this limitation. In spite of this, the performance on heterophilic graphs can still be poor. We propose a simple alternative method that exploits Truncated Singular Value Decomposition (TSVD) of topological structure and node features. Our approach achieves up to ~30% improvement in performance over state-of-the-art methods on heterophilic graphs. This work is an early investigation into methods that differ from aggregation based approaches. Our experimental results suggest that it might be important to explore other alternatives to aggregation methods for heterophilic setting.

【6】 Fund2Vec: Mutual Funds Similarity using Graph Learning 标题：Fund2Vec：基于图学习的共同基金相似度

作者：Vipul Satone,Dhruv Desai,Dhagash Mehta 机构：The Vanguard Group, Inc. 备注：2 column format, 8 pages, 8 figures, 5 tables 链接：https://arxiv.org/abs/2106.12987 摘要：识别与基础投资组合相关的类似共同基金在金融服务中有许多应用，包括基金推荐系统、竞争对手分析、投资组合分析、营销和销售等。传统方法要么是定性的，因此容易产生偏差，而且往往不可复制，或者，我们知道不能从原始数据中捕捉投资组合中的所有细微差别（非线性）。我们提出了一种全新的方法来识别类似的基金，该方法基于基金及其基础资产数据的加权二部网络表示，使用一种称为Node2Vec的复杂机器学习方法来学习网络的嵌入式低维表示。我们称之为嵌入\emph{Fund2Vec}。我们首次研究了原始形式的基金资产网络的加权二部网络表示法，该方法确定了投资组合之间的结构相似性，而不仅仅是投资组合重叠。摘要：Identifying similar mutual funds with respect to the underlying portfolios has found many applications in financial services ranging from fund recommender systems, competitors analysis, portfolio analytics, marketing and sales, etc. The traditional methods are either qualitative, and hence prone to biases and often not reproducible, or, are known not to capture all the nuances (non-linearities) among the portfolios from the raw data. We propose a radically new approach to identify similar funds based on the weighted bipartite network representation of funds and their underlying assets data using a sophisticated machine learning method called Node2Vec which learns an embedded low-dimensional representation of the network. We call the embedding \emph{Fund2Vec}. Ours is the first ever study of the weighted bipartite network representation of the funds-assets network in its original form that identifies structural similarity among portfolios as opposed to merely portfolio overlaps.

Transformer(6篇)

【1】 Video Swin Transformer 标题：视频双Transformer

作者：Ze Liu,Jia Ning,Yue Cao,Yixuan Wei,Zheng Zhang,Stephen Lin,Han Hu 机构：Microsoft Research Asia, University of Science and Technology of China, Huazhong University of Science and Technology, Tsinghua University 链接：https://arxiv.org/abs/2106.13230 摘要：视觉界正在见证从CNNs到Transformers的建模转变，在这里，纯Transformer架构已经在主要视频识别基准上达到了最高的精确度。这些视频模型都是建立在变换层上的，变换层在空间和时间维度上全局连接补丁。在本文中，我们主张在视频转换器中引入局部性的感应偏差，这使得与以前的方法相比，它可以更好地在速度-精度上进行折衷，而以前的方法是全局计算自我注意，即使是使用空时分解。提出的视频结构的局部性是通过调整为图像域设计的swn变换器来实现的，同时继续利用预先训练的图像模型的能力。我们的方法在广泛的视频识别基准上实现了最先进的精度，包括动作识别（Kinetics-400上的最高精度为84.9，Kinetics-600上的最高精度为86.1，预训练数据减少了约20倍，模型尺寸减小了约3倍）和时间建模（Kinetics-v2上的最高精度为69.6）。代码和模型将在https://github.com/SwinTransformer/Video-Swin-Transformer. 摘要：The vision community is witnessing a modeling shift from CNNs to Transformers, where pure Transformer architectures have attained top accuracy on the major video recognition benchmarks. These video models are all built on Transformer layers that globally connect patches across the spatial and temporal dimensions. In this paper, we instead advocate an inductive bias of locality in video Transformers, which leads to a better speed-accuracy trade-off compared to previous approaches which compute self-attention globally even with spatial-temporal factorization. The locality of the proposed video architecture is realized by adapting the Swin Transformer designed for the image domain, while continuing to leverage the power of pre-trained image models. Our approach achieves state-of-the-art accuracy on a broad range of video recognition benchmarks, including on action recognition (84.9 top-1 accuracy on Kinetics-400 and 86.1 top-1 accuracy on Kinetics-600 with ~20x less pre-training data and ~3x smaller model size) and temporal modeling (69.6 top-1 accuracy on Something-Something v2). The code and models will be made publicly available at https://github.com/SwinTransformer/Video-Swin-Transformer.

【2】 Exploring Corruption Robustness: Inductive Biases in Vision Transformers and MLP-Mixers 标题：探索腐败稳健性：视觉Transformer和MLP-Mixer中的感应偏差

作者：Katelyn Morrison,Benjamin Gilby,Colton Lipchak,Adam Mattioli,Adriana Kovashka 机构： Several recent works seeking to develop and trainEqual contribution 1Department of Computer Science 备注：Under review at the Uncertainty and Robustness in Deep Learning workshop at ICML 2021. Our appendix is attached to the last page of the paper 链接：https://arxiv.org/abs/2106.13122 摘要：最近，为了解决卷积神经网络中普遍存在的一些弱点，人们开发了基于视觉变换和MLP的模型。由于Transformer在这一领域的应用以及自我注意机制的新颖性，目前尚不清楚这些体系结构在多大程度上对腐蚀具有鲁棒性。尽管有一些工作建议数据扩充对于模型抗破坏的健壮性仍然是必不可少的，但是我们建议探索体系结构对破坏健壮性的影响。我们发现，视觉转换器架构本质上比ResNet-50和MLP混频器更能抵抗腐蚀。我们还发现，参数比ResNet-50少5倍的视觉变换器具有更大的形状偏差。我们的代码可以复制。摘要：Recently, vision transformers and MLP-based models have been developed in order to address some of the prevalent weaknesses in convolutional neural networks. Due to the novelty of transformers being used in this domain along with the self-attention mechanism, it remains unclear to what degree these architectures are robust to corruptions. Despite some works proposing that data augmentation remains essential for a model to be robust against corruptions, we propose to explore the impact that the architecture has on corruption robustness. We find that vision transformer architectures are inherently more robust to corruptions than the ResNet-50 and MLP-Mixers. We also find that vision transformers with 5 times fewer parameters than a ResNet-50 have more shape bias. Our code is available to reproduce.

【3】 Pre-training transformer-based framework on large-scale pediatric claims data for downstream population-specific tasks 标题：基于预训练Transformer的下游人群特定任务大规模儿科索赔数据框架

作者：Xianlong Zeng,Simon Lin,Chang Liu 机构：Ohio University, Electrical Engineering and Computer Science, Athens Ohio, USA, Nationwide Children’s Hospital, Research Information Solutions and Innovation, Columbus Ohio, USA 链接：https://arxiv.org/abs/2106.13095 摘要：在过去的十年中，电子健康档案（EHR）的采用已变得普遍，这提供了深入的基于数据的研究。通过对大量医疗数据的学习，人们建立了各种数据驱动模型来预测不同医疗任务的未来事件，如自动诊断和心脏病发作预测。虽然EHR是丰富的，但是满足特定学习标准的群体却很少，这使得训练数据饥渴的深度学习模型成为一个挑战。本研究提出了Claim-Pre-Training（Claim-PT）框架，这是一个通用的预训练模型，首先对整个儿科索赔数据集进行训练，然后对每个特定人群的任务进行区分性微调。医学事件的语义可以在预训练阶段获取，通过任务感知的微调阶段完成有效的知识转移。微调过程需要在不改变模型结构的情况下进行最小的参数修改，这缓解了数据不足的问题，并有助于在小患者队列中充分训练深度学习模型。我们在一个真实世界的索赔数据集上进行了实验，这个数据集有超过一百万的患者记录。在两个下游任务上的实验结果证明了该方法的有效性：我们的通用任务无关预训练框架优于定制的任务特定模型，与基线相比，模型性能提高了10%以上。此外，我们的框架显示了巨大的推广潜力，可以将学到的知识从一个机构转移到另一个机构，为未来跨机构的医疗模式预训练铺平了道路。摘要：The adoption of electronic health records (EHR) has become universal during the past decade, which has afforded in-depth data-based research. By learning from the large amount of healthcare data, various data-driven models have been built to predict future events for different medical tasks, such as auto diagnosis and heart-attack prediction. Although EHR is abundant, the population that satisfies specific criteria for learning population-specific tasks is scarce, making it challenging to train data-hungry deep learning models. This study presents the Claim Pre-Training (Claim-PT) framework, a generic pre-training model that first trains on the entire pediatric claims dataset, followed by a discriminative fine-tuning on each population-specific task. The semantic meaning of medical events can be captured in the pre-training stage, and the effective knowledge transfer is completed through the task-aware fine-tuning stage. The fine-tuning process requires minimal parameter modification without changing the model architecture, which mitigates the data scarcity issue and helps train the deep learning model adequately on small patient cohorts. We conducted experiments on a real-world claims dataset with more than one million patient records. Experimental results on two downstream tasks demonstrated the effectiveness of our method: our general task-agnostic pre-training framework outperformed tailored task-specific models, achieving more than 10\% higher in model performance as compared to baselines. In addition, our framework showed a great generalizability potential to transfer learned knowledge from one institution to another, paving the way for future healthcare model pre-training across institutions.

【4】 Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting 标题：Autoformer：用于长期序列预测的具有自相关功能的分解Transformer

作者：Haixu Wu,Jiehui Xu,Jianmin Wang,Mingsheng Long 机构：School of Software, BNRist, Tsinghua University, China 链接：https://arxiv.org/abs/2106.13008 摘要：延长预测时间是极端天气预警和长期能源消耗规划等实际应用的关键需求。本文研究了时间序列的长期预测问题。先前基于Transformer的模型采用各种自我注意机制来发现长程依赖关系。然而，长期未来复杂的时间模式阻碍了模型找到可靠的依赖关系。同时，为了提高长串效率，Transformer必须采用点式自关注的稀疏形式，造成了信息利用的瓶颈。针对这些挑战，我们提出了一种具有自相关机制的新型分解结构Autoformer。我们超越了级数分解的预处理约定，将其更新为深度模型的基本内部块。这种设计使Autoformer具有复杂时间序列的渐进分解能力。进一步，受随机过程理论的启发，设计了基于序列周期性的自相关机制，在子序列层次上进行相关性发现和表示聚合。自相关在效率和准确性上都优于自我注意。在长期预测中，Autoformer的精度达到了最先进的水平，在六个基准上相对提高了38%，涵盖了五个实际应用：能源、交通、经济、天气和疾病。摘要：Extending the forecasting time is a critical demand for real applications, such as extreme weather early warning and long-term energy consumption planning. This paper studies the \textit{long-term forecasting} problem of time series. Prior Transformer-based models adopt various self-attention mechanisms to discover the long-range dependencies. However, intricate temporal patterns of the long-term future prohibit the model from finding reliable dependencies. Also, Transformers have to adopt the sparse versions of point-wise self-attentions for long series efficiency, resulting in the information utilization bottleneck. Towards these challenges, we propose Autoformer as a novel decomposition architecture with an Auto-Correlation mechanism. We go beyond the pre-processing convention of series decomposition and renovate it as a basic inner block of deep models. This design empowers Autoformer with progressive decomposition capacities for complex time series. Further, inspired by the stochastic process theory, we design the Auto-Correlation mechanism based on the series periodicity, which conducts the dependencies discovery and representation aggregation at the sub-series level. Auto-Correlation outperforms self-attention in both efficiency and accuracy. In long-term forecasting, Autoformer yields state-of-the-art accuracy, with a 38% relative improvement on six benchmarks, covering five practical applications: energy, traffic, economics, weather and disease.

【5】 Charformer: Fast Character Transformers via Gradient-based Subword Tokenization 标题：Charformer：基于梯度子字标记化的快速字符转换器

作者：Yi Tay,Vinh Q. Tran,Sebastian Ruder,Jai Gupta,Hyung Won Chung,Dara Bahri,Zhen Qin,Simon Baumgartner,Cong Yu,Donald Metzler 机构：Google Research and DeepMind† 链接：https://arxiv.org/abs/2106.12672 摘要：自然语言处理中最先进的模型依赖于独立的刚性子词标记化算法，这限制了它们的泛化能力和对新环境的适应能力。在本文中，我们提出了一个新的模型归纳偏见，学习一个子字符号化作为模型的一部分。为此，我们引入了一个基于软梯度的子词标记化模块（GBST），它以数据驱动的方式从字符中自动学习潜在的子词表示。具体地说，GBST枚举候选子词块并学习使用块评分网络以位置方式对它们进行评分。此外，我们还介绍了Charformer，这是一种深度转换器模型，它集成了GBST并在字节级进行操作。通过对英语胶水、多语言和有噪声的文本数据集的大量实验，我们发现Charformer优于一系列有竞争力的字节级基线，同时通常表现在平价上，有时甚至优于基于子词的模型。此外，Charformer速度很快，将普通字节级和子字级转换器的速度提高了28%-100%，同时保持了有竞争力的质量。我们相信这项工作为完全端到端训练的高性能无令牌模型铺平了道路。摘要：State-of-the-art models in natural language processing rely on separate rigid subword tokenization algorithms, which limit their generalization ability and adaptation to new settings. In this paper, we propose a new model inductive bias that learns a subword tokenization end-to-end as part of the model. To this end, we introduce a soft gradient-based subword tokenization module (GBST) that automatically learns latent subword representations from characters in a data-driven fashion. Concretely, GBST enumerates candidate subword blocks and learns to score them in a position-wise fashion using a block scoring network. We additionally introduce Charformer, a deep Transformer model that integrates GBST and operates on the byte level. Via extensive experiments on English GLUE, multilingual, and noisy text datasets, we show that Charformer outperforms a series of competitive byte-level baselines while generally performing on par and sometimes outperforming subword-based models. Additionally, Charformer is fast, improving the speed of both vanilla byte-level and subword-level Transformers by 28%-100% while maintaining competitive quality. We believe this work paves the way for highly performant token-free models that are trained completely end-to-end.

【6】 Transformer-based unsupervised patient representation learning based on medical claims for risk stratification and analysis 标题：基于Transformer的基于医疗索赔的无监督患者表征学习风险分层分析

作者：Xianlong Zeng,Simon Lin,Chang Liu 机构：Electrical Engineering and, Computer Science, Ohio University, Athens Ohio USA, Research Information Solutions, and Innovation, Nationwide Children's Hospital, Columbus Ohio USA 链接：https://arxiv.org/abs/2106.12658 摘要：索赔数据包含医疗代码、服务信息和发生的支出，是估计个人健康状况和医疗风险水平的良好资源。在这项研究中，我们开发了基于Transformer的多模态自动编码器（TMAE），这是一个无监督的学习框架，可以通过编码索赔数据中有意义的信息来学习有效的患者表示。TMAE的动机是医疗保健的实际需要，将患者分为不同的风险水平，以改善医疗服务的提供和管理。与以前的方法相比，TMAE能够1）对住院患者、门诊患者和药物申请进行集体建模，2）处理医疗事件之间不规则的时间间隔，3）缓解罕见医疗代码的稀疏性问题，4）合并医疗支出信息。我们使用一个包含600000多名患者的真实儿科索赔数据集对TMAE进行了训练，并在两个聚类任务中比较了其与各种方法的性能。实验结果表明，TMAE具有优于所有基线的性能。多个下游应用程序也说明了我们的框架的有效性。有希望的结果证实，TMAE框架可扩展到大型索赔数据，并且能够生成有效的患者嵌入，用于风险分层和分析。摘要：The claims data, containing medical codes, services information, and incurred expenditure, can be a good resource for estimating an individual's health condition and medical risk level. In this study, we developed Transformer-based Multimodal AutoEncoder (TMAE), an unsupervised learning framework that can learn efficient patient representation by encoding meaningful information from the claims data. TMAE is motivated by the practical needs in healthcare to stratify patients into different risk levels for improving care delivery and management. Compared to previous approaches, TMAE is able to 1) model inpatient, outpatient, and medication claims collectively, 2) handle irregular time intervals between medical events, 3) alleviate the sparsity issue of the rare medical codes, and 4) incorporate medical expenditure information. We trained TMAE using a real-world pediatric claims dataset containing more than 600,000 patients and compared its performance with various approaches in two clustering tasks. Experimental results demonstrate that TMAE has superior performance compared to all baselines. Multiple downstream applications are also conducted to illustrate the effectiveness of our framework. The promising results confirm that the TMAE framework is scalable to large claims data and is able to generate efficient patient embeddings for risk stratification and analysis.

GAN|对抗|攻击|生成相关(7篇)

【1】 A Deep Learning Approach to Private Data Sharing of Medical Images Using Conditional GANs 标题：基于条件GANS的医学图像隐私数据共享深度学习方法

作者：Hanxi Sun,Jason Plawinski,Sajanth Subramaniam,Amir Jamaludin,Timor Kadir,Aimee Readie,Gregory Ligozio,David Ohlssen,Mark Baillie,Thibaud Coroller 机构：Purdue University, Department of Statistics, West Lafayette, IN, USA, Novartis, Basel, Switzerland, Oxford Big Data Institute, Oxford, UK, East Hanover, NJ, USA 链接：https://arxiv.org/abs/2106.13199 摘要：共享来自临床研究的数据可以促进创新的数据驱动的研究，并最终导致更好的公共卫生。然而，共享生物医学数据会使敏感的个人信息面临风险。这通常通过匿名化来解决，这是一个缓慢而昂贵的过程。匿名的另一种选择是共享一个合成数据集，该数据集具有与真实数据相似的行为，但保留了隐私。作为诺华和牛津大数据研究所合作的一部分，我们基于COSENTYX（secukinumab）强直性脊柱炎（As）临床研究生成了一个合成数据集。我们应用辅助分类器GAN（ac-GAN）生成椎体单元（VUs）的合成磁共振图像（MRIs）。图像以VU位置（颈部、胸部和腰部）为条件。本文提出了一种合成数据集的生成方法，并从图像保真度、样本多样性和数据集隐私性三个关键指标对合成数据集的性质进行了深入分析。摘要：Sharing data from clinical studies can facilitate innovative data-driven research and ultimately lead to better public health. However, sharing biomedical data can put sensitive personal information at risk. This is usually solved by anonymization, which is a slow and expensive process. An alternative to anonymization is sharing a synthetic dataset that bears a behaviour similar to the real data but preserves privacy. As part of the collaboration between Novartis and the Oxford Big Data Institute, we generate a synthetic dataset based on COSENTYX (secukinumab) Ankylosing Spondylitis (AS) clinical study. We apply an Auxiliary Classifier GAN (ac-GAN) to generate synthetic magnetic resonance images (MRIs) of vertebral units (VUs). The images are conditioned on the VU location (cervical, thoracic and lumbar). In this paper, we present a method for generating a synthetic dataset and conduct an in-depth analysis on its properties of along three key metrics: image fidelity, sample diversity and dataset privacy.

【2】 Unsupervised Learning of Depth and Depth-of-Field Effect from Natural Images with Aperture Rendering Generative Adversarial Networks 标题：基于孔径绘制生成性对抗网络的自然图像景深和景深效应的无监督学习

作者：Takuhiro Kaneko 机构：NTT Communication Science Laboratories, NTT Corporation 备注：Accepted to CVPR 2021 (Oral). Project page: this https URL 链接：https://arxiv.org/abs/2106.13041 摘要：从二维投影的自然图像中理解三维世界是计算机视觉和图形学的一个基本挑战。近年来，一种无监督学习方法因其在数据收集方面的优势而受到广泛关注。然而，为了减轻训练限制，典型方法需要对视点分布（例如，包含各种视点图像的数据集）或对象形状（例如，对称对象）施加假设。这些假设常常限制应用；例如，应用于非刚性物体或从相似视点（例如，花或鸟图像）捕获的图像仍然是一个挑战。为了补充这些方法，我们提出了孔径渲染生成对抗网络（AR-GANs），该网络在GANs上配置孔径渲染，并采用聚焦线索来学习未标记自然图像的景深和景深效果。为了解决由无监督设置引起的模糊（即平滑纹理和离焦模糊之间的模糊，前景和背景模糊之间的模糊），我们开发了自由度混合学习，使生成器能够在生成不同自由度图像的同时学习真实的图像分布。此外，我们在引导学习方向之前，设计了一个中心焦点。在实验中，我们证明了AR-GANs在各种数据集（如花卉、鸟类和人脸图像）中的有效性，并通过将其与其他三维表示学习GANs相结合，证明了其可移植性，验证了其在浅自由度绘制中的适用性。摘要：Understanding the 3D world from 2D projected natural images is a fundamental challenge in computer vision and graphics. Recently, an unsupervised learning approach has garnered considerable attention owing to its advantages in data collection. However, to mitigate training limitations, typical methods need to impose assumptions for viewpoint distribution (e.g., a dataset containing various viewpoint images) or object shape (e.g., symmetric objects). These assumptions often restrict applications; for instance, the application to non-rigid objects or images captured from similar viewpoints (e.g., flower or bird images) remains a challenge. To complement these approaches, we propose aperture rendering generative adversarial networks (AR-GANs), which equip aperture rendering on top of GANs, and adopt focus cues to learn the depth and depth-of-field (DoF) effect of unlabeled natural images. To address the ambiguities triggered by unsupervised setting (i.e., ambiguities between smooth texture and out-of-focus blurs, and between foreground and background blurs), we develop DoF mixture learning, which enables the generator to learn real image distribution while generating diverse DoF images. In addition, we devise a center focus prior to guiding the learning direction. In the experiments, we demonstrate the effectiveness of AR-GANs in various datasets, such as flower, bird, and face images, demonstrate their portability by incorporating them into other 3D representation learning GANs, and validate their applicability in shallow DoF rendering.

【3】 SofaMyRoom: a fast and multiplatform "shoebox" room simulator for binaural room impulse response dataset generation

作者：Roberto Barumerli,Daniele Bianchi,Michele Geronazzo,Federico Avanzini 机构：Avanzinib, Acoustic Research Institute, Austrian Academy of Sciences, Vienna, Austria, Dept. of Computer Science Dept., University of Milan, Milan, Italy, Dyson School of Design Engineering, Imperial College London, London, SW,AZ, United Kingdom 备注：18 pages,4 figures, accompanying paper for an acoustic simulator description 链接：https://arxiv.org/abs/2106.12992 摘要：本文介绍了一种鞋盒房间模拟器，它能在给定一组头部相关传递函数（HRTFs）的情况下系统地生成双耳房间脉冲响应（BRIRs）的综合数据集。机器听觉算法的评估经常需要BRIR数据集来模拟任何环境的声学。然而，目前可用的解决方案通常只考虑在假人头部上测量的HRTF，这很难描述空间声音感知的高可变性。我们的解决方案允许集成一个房间脉冲响应（RIR）模拟器和不同的HRTF集，以面向空间的声学格式表示（SOFA）。不同操作系统的源代码和编译的二进制文件允许高级和非专家用户从我们的工具箱中获益，请参阅https://github.com/spatialaudiotools/sofamyroom/ . 摘要：This paper introduces a shoebox room simulator able to systematically generate synthetic datasets of binaural room impulse responses (BRIRs) given an arbitrary set of head-related transfer functions (HRTFs). The evaluation of machine hearing algorithms frequently requires BRIR datasets in order to simulate the acoustics of any environment. However, currently available solutions typically consider only HRTFs measured on dummy heads, which poorly characterize the high variability in spatial sound perception. Our solution allows to integrate a room impulse response (RIR) simulator with different HRTF sets represented in Spatially Oriented Format for Acoustics (SOFA). The source code and the compiled binaries for different operating systems allow to both advanced and non-expert users to benefit from our toolbox, see https://github.com/spatialaudiotools/sofamyroom/ .

【4】 Abstraction of Markov Population Dynamics via Generative Adversarial Nets 标题：基于产生式对抗性网的马尔可夫种群动力学抽象

作者：Francesca Cairoli,Ginevra Carbone,Luca Bortolussi 机构： Department of Mathematics and Geosciences, University of Trieste, Italy, Modeling and Simulation Group, Saarland University, Germany 链接：https://arxiv.org/abs/2106.12981 摘要：马尔可夫种群模型是一种广泛应用于复杂系统动力学建模的形式主义，在系统生物学等许多领域有着广泛的应用。连续时间内相关的马尔可夫随机过程通常是通过仿真来分析的，这对于大型或刚性系统来说可能是昂贵的，特别是当必须执行大量仿真时（例如在多尺度模型中）。减少计算量的一个策略是抽象人口模型，用一个更简单、模拟速度更快的随机模型代替它。本文在前人工作的基础上，构造了一个能在连续空间和离散时间内产生随机轨迹的发生器。该生成器是在生成对抗环境下从原始模型的仿真中自动学习的。与以往依赖深度神经网络和Dirichlet过程的工作相比，我们探索了使用最先进的生成模型，这些模型足够灵活，可以学习完整的轨迹，而不是单一的过渡核。摘要：Markov Population Models are a widespread formalism used to model the dynamics of complex systems, with applications in Systems Biology and many other fields. The associated Markov stochastic process in continuous time is often analyzed by simulation, which can be costly for large or stiff systems, particularly when a massive number of simulations has to be performed (e.g. in a multi-scale model). A strategy to reduce computational load is to abstract the population model, replacing it with a simpler stochastic model, faster to simulate. Here we pursue this idea, building on previous works and constructing a generator capable of producing stochastic trajectories in continuous space and discrete time. This generator is learned automatically from simulations of the original model in a Generative Adversarial setting. Compared to previous works, which rely on deep neural networks and Dirichlet processes, we explore the use of state of the art generative models, which are flexible enough to learn a full trajectory rather than a single transition kernel.

【5】 Long-term Cross Adversarial Training: A Robust Meta-learning Method for Few-shot Classification Tasks 标题：长期交叉对抗性训练：一种用于少射击分类任务的稳健元学习方法

作者：Fan Liu,Shuyu Zhao,Xuelong Dai,Bin Xiao 机构： recent 1Department of Computing, The Hong Kong PolytechnicUniversity 链接：https://arxiv.org/abs/2106.12900 摘要：元学习模型可以使用少量的镜头标记数据快速适应新的任务。然而，尽管在少数镜头分类任务上取得了很好的泛化效果，但是在少数镜头学习中如何提高元学习模型的对抗鲁棒性仍然是一个挑战。尽管对抗性训练（AT）方法如对抗性查询（AQ）可以提高元学习模型的对抗性鲁棒性能，但AT训练的计算代价仍然很高。另一方面，用AT训练的元学习模型会显著降低原始清晰图像的准确率。提出了一种基于对抗鲁棒神经网络的元学习方法，称为长期交叉对抗训练（LCAT）。LCAT将沿自然和对抗样本分布方向对元学习模型参数进行长期交叉更新，以提高对抗和清洁少射分类精度。由于交叉对抗训练的存在，LCAT只需要比AQ少一半的对抗训练时间，对抗训练计算量较低。实验结果表明，在元学习模型中，LCAT在干净和对抗性Few-Shot分类精度上均优于SOTA对抗性训练方法。摘要：Meta-learning model can quickly adapt to new tasks using few-shot labeled data. However, despite achieving good generalization on few-shot classification tasks, it is still challenging to improve the adversarial robustness of the meta-learning model in few-shot learning. Although adversarial training (AT) methods such as Adversarial Query (AQ) can improve the adversarially robust performance of meta-learning models, AT is still computationally expensive training. On the other hand, meta-learning models trained with AT will drop significant accuracy on the original clean images. This paper proposed a meta-learning method on the adversarially robust neural network called Long-term Cross Adversarial Training (LCAT). LCAT will update meta-learning model parameters cross along the natural and adversarial sample distribution direction with long-term to improve both adversarial and clean few-shot classification accuracy. Due to cross-adversarial training, LCAT only needs half of the adversarial training epoch than AQ, resulting in a low adversarial training computation. Experiment results show that LCAT achieves superior performance both on the clean and adversarial few-shot classification accuracy than SOTA adversarial training methods for meta-learning models.

【6】 Dungeon and Platformer Level Blending and Generation using Conditional VAEs 标题：基于条件VAE的地下城与平台级混合与生成

作者：Anurag Sarkar,Seth Cooper 机构：Northeastern University, Boston, MA, USA 链接：https://arxiv.org/abs/2106.12692 摘要：变分自动编码器（VAE）已被用于从不同的游戏生成和混合级别的先前工作。为了增加这些模型的可控性，条件VAE（CVAE）最近被证明能够生成可以使用指定所需内容的标签进行修改的输出，尽管只与级别段和平台一起工作。我们通过使用CVAEs来生成整个平台和地下城的等级，以及混合这些类型的等级来扩展这些作品。我们证明了CVAEs能够可靠地控制地下城的门位置和平台层的行进方向。因此，通过使用适当的标签，我们的方法可以分别生成整个地下城和互联房间和分段的平台层，以及混合地下城和平台层。我们用塞尔达、Metroid、Mega Man和Lode Runner的传说来演示我们的方法。摘要：Variational autoencoders (VAEs) have been used in prior works for generating and blending levels from different games. To add controllability to these models, conditional VAEs (CVAEs) were recently shown capable of generating output that can be modified using labels specifying desired content, albeit working with segments of levels and platformers exclusively. We expand these works by using CVAEs for generating whole platformer and dungeon levels, and blending levels across these genres. We show that CVAEs can reliably control door placement in dungeons and progression direction in platformer levels. Thus, by using appropriate labels, our approach can generate whole dungeons and platformer levels of interconnected rooms and segments respectively as well as levels that blend dungeons and platformers. We demonstrate our approach using The Legend of Zelda, Metroid, Mega Man and Lode Runner.

【7】 Adversarial Examples in Multi-Layer Random ReLU Networks 标题：多层随机REU网络中的对抗性实例

作者：Peter L. Bartlett,Sébastien Bubeck,Yeshwanth Cherapanamjeri 机构：Department of Electrical Engineering and Computer Science, Department of Statistics, UC Berkeley, Microsoft Research Redmond 链接：https://arxiv.org/abs/2106.12611 摘要：我们考虑了具有独立高斯参数的ReLU网络中的对抗性例子现象。对于深度恒定且宽度范围较大的网络（例如，如果每一层的宽度与任何其他层的宽度是多项式，就足够了），输入向量的小扰动会导致输出的大变化。这推广了Daniely和Schacham（2020）关于宽度迅速减小网络的结果，以及Bubeck等人（2021）关于双层网络的结果。证明表明，在这些网络中出现对抗性例子是因为它们计算的函数非常接近线性。网络中的瓶颈层起着关键作用：网络中某个点的最小宽度决定了计算到该点的映射的规模和灵敏度。主要结果是对于具有常数深度的网络，但是我们也证明了对于这类结果，深度上的一些约束是必要的，因为有适当的深度网络，以常数概率计算接近常数的函数。摘要：We consider the phenomenon of adversarial examples in ReLU networks with independent gaussian parameters. For networks of constant depth and with a large range of widths (for instance, it suffices if the width of each layer is polynomial in that of any other layer), small perturbations of input vectors lead to large changes of outputs. This generalizes results of Daniely and Schacham (2020) for networks of rapidly decreasing width and of Bubeck et al (2021) for two-layer networks. The proof shows that adversarial examples arise in these networks because the functions that they compute are very close to linear. Bottleneck layers in the network play a key role: the minimal width up to some point in the network determines scales and sensitivities of mappings computed up to that point. The main result is for networks with constant depth, but we also show that some constraint on depth is necessary for a result of this kind, because there are suitably deep networks that, with constant probability, compute a function that is close to constant.

半/弱/无/有监督|不确定性|主动学习(8篇)

【1】 Shallow Representation is Deep: Learning Uncertainty-aware and Worst-case Random Feature Dynamics 标题：浅层表示是深层的：学习不确定性感知和最坏情况下的随机特征动力学

作者：Diego Agudelo-España,Yassine Nemmour,Bernhard Schölkopf,Jia-Jie Zhu 机构：MaxPlanckInstituteforIntelligentSystems 备注：9 pages, 5 figures 链接：https://arxiv.org/abs/2106.13066 摘要：随机特征是一种功能强大的通用函数逼近器，它继承了核方法的理论严谨性，可以扩展到现代学习任务。本文将不确定系统模型视为广义再生核Hilbert空间中的未知或不确定光滑函数。利用参数不确定的随机特征直接逼近一步动力学函数，将整个动力学系统看作一个多层神经网络。利用哈密顿动力学的结构，我们证明了用庞特里亚金最小值原理寻找最坏情况的动力学实现相当于在深网上执行Frank-Wolfe算法。动力学学习的各种数值实验展示了我们建模方法的能力。摘要：Random features is a powerful universal function approximator that inherits the theoretical rigor of kernel methods and can scale up to modern learning tasks. This paper views uncertain system models as unknown or uncertain smooth functions in universal reproducing kernel Hilbert spaces. By directly approximating the one-step dynamics function using random features with uncertain parameters, which are equivalent to a shallow Bayesian neural network, we then view the whole dynamical system as a multi-layer neural network. Exploiting the structure of Hamiltonian dynamics, we show that finding worst-case dynamics realizations using Pontryagin's minimum principle is equivalent to performing the Frank-Wolfe algorithm on the deep net. Various numerical experiments on dynamics learning showcase the capacity of our modeling methodology.

【2】 Unsupervised Topic Segmentation of Meetings with BERT Embeddings 标题：基于BERT嵌入的无监督会议主题分割

作者：Alessandro Solbiati,Kevin Heffernan,Georgios Damaskinos,Shivani Poddar,Shubham Modi,Jacques Cali 机构：Facebook, Inc. 链接：https://arxiv.org/abs/2106.12978 摘要：会议的主题分割是将多人会议记录分割成主题块的任务。由于难以收集和准确注释大型数据集，有监督的方法已被证明是难以解决的。在本文中，我们展示了如何使用预先训练好的神经结构来改进以前的无监督主题分割方法。我们介绍了一种基于BERT嵌入的无监督方法，与现有的无监督方法相比，该方法可以降低15.5%的错误率。摘要：Topic segmentation of meetings is the task of dividing multi-person meeting transcripts into topic blocks. Supervised approaches to the problem have proven intractable due to the difficulties in collecting and accurately annotating large datasets. In this paper we show how previous unsupervised topic segmentation methods can be improved using pre-trained neural architectures. We introduce an unsupervised approach based on BERT embeddings that achieves a 15.5% reduction in error rate over existing unsupervised approaches applied to two popular datasets for meeting transcripts.

【3】 Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated Scenes 标题：无纹理室内旋转场景的自监督单目深度估计

作者：Benjamin Keltjens,Tom van Dijk,Guido de Croon 机构：Delft University of Technology, Guido C.H.E. de Croon 链接：https://arxiv.org/abs/2106.12958 摘要：自监督深度学习方法利用立体图像训练单目深度估计。尽管这些方法在KITTI等室外数据集上表现出了很强的效果，但它们在有摄像机旋转的室内环境中的性能与有监督的方法不匹配。在室内，旋转的场景通常用于约束较少的应用，并且由于两个原因造成问题：低纹理区域的丰富性和旋转下图像的深度提示的复杂性增加。为了将自我监督学习扩展到更广泛的环境中，我们提出了两个补充。首先，我们提出了一种新的填充视差损失项来修正无纹理区域中图像重建误差损失的模糊性。具体地说，我们使用来自周围纹理区域的估计视差来插值无纹理区域的视差，并使用L1损失来校正原始估计。我们的实验表明，与Godard等人的Monodepth相比，在低纹理场景下，深度估计有了很大的改进，而在纹理场景上没有任何损失。其次，我们证明了应用程序在俯仰和滚动中的代表性旋转训练，足以显著提高整个预期轮换范围内的绩效。我们证明了当在没有摄像机旋转的测试集上进行评估时，深度估计是成功推广的，因为性能没有损失。这些发展使得自监督学习在复杂环境下的单目深度估计得到了更广泛的应用。摘要：Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation. Although these methods show strong results on outdoor datasets such as KITTI, they do not match performance of supervised methods on indoor environments with camera rotation. Indoor, rotated scenes are common for less constrained applications and pose problems for two reasons: abundance of low texture regions and increased complexity of depth cues for images under rotation. In an effort to extend self-supervised learning to more generalised environments we propose two additions. First, we propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions. Specifically, we interpolate disparity in untextured regions, using the estimated disparity from surrounding textured areas, and use L1 loss to correct the original estimation. Our experiments show that depth estimation is substantially improved on low-texture scenes, without any loss on textured scenes, when compared to Monodepth by Godard et al. Secondly, we show that training with an application's representative rotations, in both pitch and roll, is sufficient to significantly improve performance over the entire range of expected rotation. We demonstrate that depth estimation is successfully generalised as performance is not lost when evaluated on test sets with no camera rotation. Together these developments enable a broader use of self-supervised learning of monocular depth estimation for complex environments.

【4】 Factors affecting the COVID-19 risk in the US counties: an innovative approach by combining unsupervised and supervised learning 标题：美国各县冠状病毒风险的影响因素：无监督学习和监督学习相结合的创新方法

作者：Samira Ziyadidegan,Moein Razavi,Homa Pesarakli,Amir Hossein Javid,Madhav Erraguntla 机构：Department of Computer Science and Engineering, Texas A&M University, College Station, TX, moeinra-, Department of Industrial and Systems Engineering, Texas A&M University, College Station, TX, merra-, , These authors contributed equally to this paper. 备注：15 pages, 8 figures, 5 tables 链接：https://arxiv.org/abs/2106.12766 摘要：COVID-19疾病传播迅速，在中国确诊第一例阳性病例近三个月后，冠状病毒开始在美国各地传播。一些州和县报告阳性病例和死亡人数较高，而一些州和县报告与COVID-19相关的病例和死亡率较低。本文分析了县级人群中可能影响COVID-19感染风险和死亡率的因素。提出了一种基于K均值聚类和多种分类模型的方法来确定最关键因素。结果显示，平均气温、贫困人口百分比、肥胖成年人百分比、气压、人口密度、风速、经度和未参保人口百分比是最显著的属性摘要：The COVID-19 disease spreads swiftly, and nearly three months after the first positive case was confirmed in China, Coronavirus started to spread all over the United States. Some states and counties reported high number of positive cases and deaths, while some reported lower COVID-19 related cases and mortality. In this paper, the factors that could affect the risk of COVID-19 infection and mortality were analyzed in county level. An innovative method by using K-means clustering and several classification models is utilized to determine the most critical factors. Results showed that mean temperature, percent of people below poverty, percent of adults with obesity, air pressure, population density, wind speed, longitude, and percent of uninsured people were the most significant attributes

【5】 SALT: Sea lice Adaptive Lattice Tracking -- An Unsupervised Approach to Generate an Improved Ocean Model 标题：SALT：海虱自适应格子跟踪--一种生成改进海洋模型的无监督方法

作者：Ju An Park,Vikram Voleti,Kathryn E. Thomas,Alexander Wong,Jason L. Deglint 机构： Canada 3Universit´e de Montr´eal, Canada 4Ontario Tech University, Canada 5University of Wa-terloo 备注：5 pages, 3 figures, 3 tables 链接：https://arxiv.org/abs/2106.13202 摘要：由于气候变化导致的海洋变暖正导致外寄生桡足类（也称为海虱）数量的增加，这可能对野生鲑鱼种群造成重大生态损失，并给水产养殖场造成重大经济损失。驱动海虱种群扩散的主要传输机制是近地表洋流。目前估计海虱幼虫分布的方法计算复杂，限制了全尺度分析。为了应对这一挑战，我们提出了SALT：海虱自适应格点跟踪方法，用于有效估计海虱在空间和时间上的分散和分布。具体地说，基于局部海洋特性，通过合并海洋模型格点图中的节点来生成自适应空间网格，从而实现高效的图形表示。SALT使用挪威Hardangerfjord的近地表电流数据，在保持与标准方法一致的结果的同时，证明了效率的提高。提出的盐技术通过预测气候变化中的海虱侵染压力图，显示了加强主动水产养殖管理的前景。摘要：Warming oceans due to climate change are leading to increased numbers of ectoparasitic copepods, also known as sea lice, which can cause significant ecological loss to wild salmon populations and major economic loss to aquaculture sites. The main transport mechanism driving the spread of sea lice populations are near-surface ocean currents. Present strategies to estimate the distribution of sea lice larvae are computationally complex and limit full-scale analysis. Motivated to address this challenge, we propose SALT: Sea lice Adaptive Lattice Tracking approach for efficient estimation of sea lice dispersion and distribution in space and time. Specifically, an adaptive spatial mesh is generated by merging nodes in the lattice graph of the Ocean Model based on local ocean properties, thus enabling highly efficient graph representation. SALT demonstrates improved efficiency while maintaining consistent results with the standard method, using near-surface current data for Hardangerfjord, Norway. The proposed SALT technique shows promise for enhancing proactive aquaculture management through predictive modelling of sea lice infestation pressure maps in a changing climate.

【6】 Understanding Uncertainty in Bayesian Deep Learning 标题：理解贝叶斯深度学习中的不确定性

作者：Cooper Lorsung 机构：to, School of Engineering and Applied Sciences, in partial fulfillment of the requirements, for the degree of, Master of Engineering, in the subject of, Computational Science and Engineering, Harvard University, Cambridge, Massachusetts 备注：97 pages, 32 figures, Master of Engineering Thesis 链接：https://arxiv.org/abs/2106.13055 摘要：神经线性模型（NLM）是一种深度贝叶斯模型，通过从数据中学习特征，然后对这些特征进行贝叶斯线性回归，从而产生预测不确定性。尽管这些模型很受欢迎，但很少有研究关注于这些模型的预测不确定性的正式评估。此外，现有的工作指出了在NLMs等模型中编码领域知识的困难，使得它们不适合于需要解释性的应用。在这项工作中，我们表明，传统的训练程序NLMs可以大大低估不确定性的数据稀缺地区。我们确定了这种行为的根本原因，并提出了一种新的训练方法，既可以捕获有用的预测不确定性，也允许纳入领域知识。摘要：Neural Linear Models (NLM) are deep Bayesian models that produce predictive uncertainty by learning features from the data and then performing Bayesian linear regression over these features. Despite their popularity, few works have focused on formally evaluating the predictive uncertainties of these models. Furthermore, existing works point out the difficulties of encoding domain knowledge in models like NLMs, making them unsuitable for applications where interpretability is required. In this work, we show that traditional training procedures for NLMs can drastically underestimate uncertainty in data-scarce regions. We identify the underlying reasons for this behavior and propose a novel training method that can both capture useful predictive uncertainties as well as allow for incorporation of domain knowledge.

【7】 Multi-Reference Alignment for sparse signals, Uniform Uncertainty Principles and the Beltway Problem 标题：稀疏信号多参考对准、均匀测不准原理与环路问题

作者：Subhro Ghosh,Philippe Rigollet 机构： National University of Singapore 链接：https://arxiv.org/abs/2106.12996 摘要：受低温电子显微镜（cryo-EM）等尖端应用的启发，多参考比对（MRA）模型需要在一组等轴测和幅值$\sigma$的加性噪声的潜在作用下，从图像的重复测量中学习未知信号。尽管有很大的兴趣，了解这个模型中的估计率的清晰图像直到最近才出现，特别是在高噪声区域$\sigma\gg 1$，这在应用中是高度相关的。最近的研究表明，对于某些傅里叶变换完全支持的信号，其显著的渐近样本复杂度为$\sigma^6$，与常规模型中出现的传统$\sigma^2$形成鲜明对比。在实践中，这些结果往往大得让人望而却步，这促使人们对MRA模型周围的变化进行研究，在这种情况下，可以获得更好的样本复杂度。在本文中，我们证明了即使在经典的MRA模型中，emph{sparse}信号也表现出中等的$\sigma^4$样本复杂度。我们的结果探索并利用了MRA估计问题与应用数学中两个经典主题的联系：来自组合优化的环道问题和来自调和分析的一致不确定性原理。摘要：Motivated by cutting-edge applications like cryo-electron microscopy (cryo-EM), the Multi-Reference Alignment (MRA) model entails the learning of an unknown signal from repeated measurements of its images under the latent action of a group of isometries and additive noise of magnitude $\sigma$. Despite significant interest, a clear picture for understanding rates of estimation in this model has emerged only recently, particularly in the high-noise regime $\sigma \gg 1$ that is highly relevant in applications. Recent investigations have revealed a remarkable asymptotic sample complexity of order $\sigma^6$ for certain signals whose Fourier transforms have full support, in stark contrast to the traditional $\sigma^2$ that arise in regular models. Often prohibitively large in practice, these results have prompted the investigation of variations around the MRA model where better sample complexity may be achieved. In this paper, we show that \emph{sparse} signals exhibit an intermediate $\sigma^4$ sample complexity even in the classical MRA model. Our results explore and exploit connections of the MRA estimation problem with two classical topics in applied mathematics: the \textit{beltway problem} from combinatorial optimization, and \textit{uniform uncertainty principles} from harmonic analysis.

【8】 Tensor networks for unsupervised machine learning 标题：张量网络在无监督机器学习中的应用

作者：Jing Liu,Sujie Li,Jiang Zhang,Pan Zhang 机构：School of Systems Science, Beĳing Normal University, CAS Key Laboratory for Theoretical Physics, Institute of Theoretical Physics, School of Physical Sciences, University of Chinese Academy of Sciences, Beĳing , China 备注：12 pages, 9 figures, for Github page, see this https URL 链接：https://arxiv.org/abs/2106.12974 摘要：对高维数据的联合分布进行建模是无监督机器学习的核心任务。近年来，基于张量网络的学习模型得到了广泛的关注，它具有利用纠缠特性从理论上理解表达能力的优点，是连接经典计算和量子计算的桥梁。然而，现有的基于张量网络的无监督模型虽然有着巨大的潜力，但由于其性能远不如标准的Boltzmann机器和神经网络等模型，因此只能作为原理的证明。在这项工作中，我们提出了自回归矩阵积态（AMPS），一种基于张量网络的模型，它结合了量子多体物理中的矩阵积态和机器学习中的自回归模型。该模型具有归一化概率和无偏抽样的精确计算，以及表达能力的清晰理论理解。我们使用两个应用程序，合成和真实数据的生成性建模和统计物理中的强化学习来演示我们模型的性能。通过大量的数值实验，我们发现该模型明显优于现有的基于张量网络的模型和受限Boltzmann机器，并与最新的神经网络模型相竞争。摘要：Modeling the joint distribution of high-dimensional data is a central task in unsupervised machine learning. In recent years, many interests have been attracted to developing learning models based on tensor networks, which have advantages of theoretical understandings of the expressive power using entanglement properties, and as a bridge connecting the classical computation and the quantum computation. Despite the great potential, however, existing tensor-network-based unsupervised models only work as a proof of principle, as their performances are much worse than the standard models such as the restricted Boltzmann machines and neural networks. In this work, we present the Autoregressive Matrix Product States (AMPS), a tensor-network-based model combining the matrix product states from quantum many-body physics and the autoregressive models from machine learning. The model enjoys exact calculation of normalized probability and unbiased sampling, as well as a clear theoretical understanding of expressive power. We demonstrate the performance of our model using two applications, the generative modeling on synthetic and real-world data, and the reinforcement learning in statistical physics. Using extensive numerical experiments, we show that the proposed model significantly outperforms the existing tensor-network-based models and the restricted Boltzmann machines, and is competitive with the state-of-the-art neural network models.

迁移|Zero/Few/One-Shot|自适应(1篇)

【1】 Study of Robust Adaptive Beamforming Based on Low-Complexity DFT Spatial Sampling 标题：基于低复杂度DFT空间采样的鲁棒自适应波束形成研究

作者：Saeed Mohammadzadeh,Vitor H. Nascimento,Rodrigo C. de Lamare,Osman Kukrer 机构： University of York 备注：12 pages, 12 figures 链接：https://arxiv.org/abs/2106.12663 摘要：基于从一组测量数据中重建随机过程的自相关序列（ACS）的思想，提出了一种新的鲁棒自适应波束形成算法。这是从样本协方差矩阵（SCM）的第一列和第一行沿对角线平均后获得的。然后，利用离散傅立叶变换（DFT）估计相关序列的功率谱。利用与噪声干扰区域内的角度相对应的DFT系数重构噪声干扰协方差矩阵（NPICM），通过识别和去除SCM中的噪声干扰分量来估计期望信号协方差矩阵（DSCM）。具体地，利用估计的接收信号的空间功率谱来计算与噪声加干扰相对应的相关序列，其中捕获噪声加干扰的主要DFT系数。自适应波束形成的一个主要优点是只需要很少的先验信息。具体地说，需要对阵列几何结构和干扰所在的角扇区有不精确的了解。仿真结果表明，与以往的基于重构的波束形成方法相比，该方法在较大的输入信噪比范围内，在多次失配的情况下可以获得更好的整体性能。摘要：In this paper, a novel and robust algorithm is proposed for adaptive beamforming based on the idea of reconstructing the autocorrelation sequence (ACS) of a random process from a set of measured data. This is obtained from the first column and the first row of the sample covariance matrix (SCM) after averaging along its diagonals. Then, the power spectrum of the correlation sequence is estimated using the discrete Fourier transform (DFT). The DFT coefficients corresponding to the angles within the noise-plus-interference region are used to reconstruct the noise-plus-interference covariance matrix (NPICM), while the desired signal covariance matrix (DSCM) is estimated by identifying and removing the noise-plus-interference component from the SCM. In particular, the spatial power spectrum of the estimated received signal is utilized to compute the correlation sequence corresponding to the noise-plus-interference in which the dominant DFT coefficient of the noise-plus-interference is captured. A key advantage of the proposed adaptive beamforming is that only little prior information is required. Specifically, an imprecise knowledge of the array geometry and of the angular sectors in which the interferences are located is needed. Simulation results demonstrate that compared with previous reconstruction-based beamformers, the proposed approach can achieve better overall performance in the case of multiple mismatches over a very large range of input signal-to-noise ratios.

强化学习(5篇)

【1】 Model-Based Reinforcement Learning via Latent-Space Collocation 标题：基于模型的潜在空间配置强化学习

作者：Oleh Rybkin,Chuning Zhu,Anusha Nagabandi,Kostas Daniilidis,Igor Mordatch,Sergey Levine 机构： 20 16;Equal contribution 1University of Pennsylvania 2Covariant 3Google AI 4UC Berkeley 备注：International Conference on Machine Learning (ICML), 2021. Videos and code at this https URL 链接：https://arxiv.org/abs/2106.13229 摘要：对未来进行规划的能力，同时只利用原始的高维观测，如图像，可以为自主代理提供广泛的能力。基于视觉模型的强化学习（RL）方法直接规划未来的行为，在只需要短时推理的任务上取得了令人印象深刻的效果，然而，这些方法在时间扩展的任务上却很困难。我们认为，通过规划状态序列而不仅仅是行动来解决长时间的任务更容易，因为行动的效果随着时间的推移而大大复合，并且更难优化。为了实现这一点，我们借鉴了最优控制文献中的搭配思想，并利用学习到的潜在状态空间模型将其应用到基于图像的环境中。由此产生的潜在搭配方法（LatCo）优化了潜在状态的轨迹，这比以前提出的基于视觉模型的RL射击方法在奖励稀疏和长期目标的任务上有了改进。视频和代码https://orybkin.github.io/latco/. 摘要：The ability to plan into the future while utilizing only raw high-dimensional observations, such as images, can provide autonomous agents with broad capabilities. Visual model-based reinforcement learning (RL) methods that plan future actions directly have shown impressive results on tasks that require only short-horizon reasoning, however, these methods struggle on temporally extended tasks. We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions, as the effects of actions greatly compound over time and are harder to optimize. To achieve this, we draw on the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, and adapt it to the image-based setting by utilizing learned latent state space models. The resulting latent collocation method (LatCo) optimizes trajectories of latent states, which improves over previously proposed shooting methods for visual model-based RL on tasks with sparse rewards and long-term goals. Videos and code at https://orybkin.github.io/latco/.

【2】 Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation 标题：通过非策略评估统一元强化学习的梯度估计器

作者：Yunhao Tang,Tadashi Kozuno,Mark Rowland,Rémi Munos,Michal Valko 机构：Columbia University, University of Alberta, DeepMind London, DeepMind Paris 链接：https://arxiv.org/abs/2106.13125 摘要：模型不可知元强化学习需要估计值函数的Hessian矩阵。从实施的角度来看，这是一个挑战，因为反复区分政策梯度估计可能会导致有偏见的黑森估计。在这项工作中，我们提供了一个统一的框架估计高阶导数的价值函数，基于非政策评估。我们的框架将一些先前的方法解释为特例，并阐明了Hessian估计的偏差和方差权衡。这个框架还打开了一个新的估计家族的大门，它可以很容易地通过自动微分库实现，并在实践中带来性能提升。摘要：Model-agnostic meta-reinforcement learning requires estimating the Hessian matrix of value functions. This is challenging from an implementation perspective, as repeatedly differentiating policy gradient estimates may lead to biased Hessian estimates. In this work, we provide a unifying framework for estimating higher-order derivatives of value functions, based on off-policy evaluation. Our framework interprets a number of prior approaches as special cases and elucidates the bias and variance trade-off of Hessian estimates. This framework also opens the door to a new family of estimates, which can be easily implemented with auto-differentiation libraries, and lead to performance gains in practice.

【3】 The Option Keyboard: Combining Skills in Reinforcement Learning 标题：选项键盘：强化学习中的组合技能

作者：André Barreto,Diana Borsa,Shaobo Hou,Gheorghe Comanici,Eser Aygün,Philippe Hamel,Daniel Toyama,Jonathan Hunt,Shibl Mourad,David Silver,Doina Precup 机构：DeepMind 备注：Published at NeurIPS 2019 链接：https://arxiv.org/abs/2106.13105 摘要：在解决长期存在的复杂强化学习问题时，结合已知技能创造新技能的能力可能至关重要。我们认为一种结合技能的稳健方法是在伪奖励（或“累积量”）空间中定义和操纵技能。基于这个前提，我们提出了一个框架，结合技能使用的形式主义的选择。我们证明了每个确定性选项都可以明确地表示为扩展域中定义的累积量。在此基础上，结合前人关于迁移学习的研究成果，我们将展示如何逼近累积量是已知期权累积量线性组合的期权。这意味着，一旦我们学习了与一组累积量相关的选项，我们就可以在不涉及任何学习的情况下，即时合成由它们的任何线性组合所诱导的选项。我们描述了这个框架如何为抽象操作与基本技能组合相对应的环境提供层次化接口。我们证明了我们的方法在资源管理问题和导航任务涉及四足模拟机器人的实际好处。摘要：The ability to combine known skills to create new ones may be crucial in the solution of complex reinforcement learning problems that unfold over extended periods. We argue that a robust way of combining skills is to define and manipulate them in the space of pseudo-rewards (or "cumulants"). Based on this premise, we propose a framework for combining skills using the formalism of options. We show that every deterministic option can be unambiguously represented as a cumulant defined in an extended domain. Building on this insight and on previous results on transfer learning, we show how to approximate options whose cumulants are linear combinations of the cumulants of known options. This means that, once we have learned options associated with a set of cumulants, we can instantaneously synthesise options induced by any linear combination of them, without any learning involved. We describe how this framework provides a hierarchical interface to the environment whose abstract actions correspond to combinations of basic skills. We demonstrate the practical benefits of our approach in a resource management problem and a navigation task involving a quadrupedal simulated robot.

【4】 rSoccer: A Framework for Studying Reinforcement Learning in Small and Very Small Size Robot Soccer 标题：rSocball：一种研究小型和超小型机器人足球强化学习的框架

作者：Felipe B. Martins,Mateus G. Machado,Hansenclever F. Bassani,Pedro H. M. Braga,Edna S. Barros 机构：Centro de Inform´atica - Universidade Federal de Pernambuco, Av. Jornalista Anibal, Fernandes, sn - CDU ,.,-, Recife, PE, Brazil. 链接：https://arxiv.org/abs/2106.12895 摘要：强化学习是一个活跃的研究领域，在机器人学中有着广泛的应用，而RoboCup竞赛是研究和评价强化学习方法的一个有趣的环境。将强化学习应用于机器人学的一个已知困难是需要大量的经验样本，即使用模拟环境训练代理，然后将学习转移到现实世界（sim-to-real）是一条可行的路径。本文介绍了一个用于IEEE超小型足球和小型联赛的开放源代码模拟器，该模拟器针对强化学习实验进行了优化。我们还提出了一个框架，用于创建OpenAI健身房环境和一组基准任务，用于评估单agent和多agent机器人足球技能。然后，我们展示了两种最先进的强化学习方法的学习能力，以及它们在该框架中引入的特定场景中的局限性。我们相信，这将使更多的团队更容易在这些类别的竞争中使用端到端强化学习方法，并进一步发展这一研究领域。摘要：Reinforcement learning is an active research area with a vast number of applications in robotics, and the RoboCup competition is an interesting environment for studying and evaluating reinforcement learning methods. A known difficulty in applying reinforcement learning to robotics is the high number of experience samples required, being the use of simulated environments for training the agents followed by transfer learning to real-world (sim-to-real) a viable path. This article introduces an open-source simulator for the IEEE Very Small Size Soccer and the Small Size League optimized for reinforcement learning experiments. We also propose a framework for creating OpenAI Gym environments with a set of benchmarks tasks for evaluating single-agent and multi-agent robot soccer skills. We then demonstrate the learning capabilities of two state-of-the-art reinforcement learning methods as well as their limitations in certain scenarios introduced in this framework. We believe this will make it easier for more teams to compete in these categories using end-to-end reinforcement learning approaches and further develop this research area.

【5】 Density Constrained Reinforcement Learning 标题：密度约束强化学习

作者：Zengyi Qin,Yuxiao Chen,Chuchu Fan 机构： one 1Massachusetts Institute of Technology 2California In-stitute of Technology 备注：Accepted by ICML, 2021 链接：https://arxiv.org/abs/2106.12764 摘要：我们从一个新的角度来研究约束强化学习（CRL），通过直接在状态密度函数上设置约束，而不是以往工作中所考虑的值函数。状态密度有一个清晰的物理和数学解释，并能够表达各种各样的约束，如资源限制和安全要求。密度约束还可以避免基于值函数的约束编码系统规范所需的设计和调整成本函数的耗时过程。我们利用密度函数和Q函数之间的对偶性，提出了一种有效的算法来优化求解密度约束RL问题，并保证了约束条件的满足。我们证明了即使在策略更新不完善的情况下，该算法也能收敛到误差有界的近似最优解。我们使用了一组综合实验来证明我们的方法相对于最先进的CRL方法的优势，包括广泛的密度约束任务以及标准的CRL基准，如Safety-Gym。摘要：We study constrained reinforcement learning (CRL) from a novel perspective by setting constraints directly on state density functions, rather than the value functions considered by previous works. State density has a clear physical and mathematical interpretation, and is able to express a wide variety of constraints such as resource limits and safety requirements. Density constraints can also avoid the time-consuming process of designing and tuning cost functions required by value function-based constraints to encode system specifications. We leverage the duality between density functions and Q functions to develop an effective algorithm to solve the density constrained RL problem optimally and the constrains are guaranteed to be satisfied. We prove that the proposed algorithm converges to a near-optimal solution with a bounded error even when the policy update is imperfect. We use a set of comprehensive experiments to demonstrate the advantages of our approach over state-of-the-art CRL methods, with a wide range of density constrained tasks as well as standard CRL benchmarks such as Safety-Gym.

符号|符号学习(1篇)

【1】 FF-NSL: Feed-Forward Neural-Symbolic Learner 标题：FF-NSL：前馈神经符号学习器

作者：Daniel Cunnington,Mark Law,Alessandra Russo,Jorge Lobo 备注：Pre-print, work in progress 链接：https://arxiv.org/abs/2106.13103 摘要：归纳逻辑程序设计（ILP）旨在以数据有效的方式学习概括的、可解释的假设。然而，当前的ILP系统要求以结构化的逻辑形式指定训练实例。本文介绍了一种神经符号学习框架，称为前馈神经符号学习器（FF-NSL），它将基于答案集语义的最新ILP系统与神经网络相结合，以便从标记的非结构化数据中学习可解释的假设。FF-NSL使用预先训练的神经网络从非结构化数据中提取符号事实，并使用ILP系统学习执行下游分类任务的假设。为了评估我们的方法在实际应用中的适用性，我们在非结构化输入数据中引入分布移位的任务上评估了该框架，对于这些任务，预先训练的神经网络可能会错误地预测，并且具有很高的置信度。实验结果表明，FF-NSL方法比随机森林和深度神经网络等基线方法具有更好的性能，可以用更少的例子学习更准确和可解释的假设。摘要：Inductive Logic Programming (ILP) aims to learn generalised, interpretable hypotheses in a data-efficient manner. However, current ILP systems require training examples to be specified in a structured logical form. This paper introduces a neural-symbolic learning framework, called Feed-Forward Neural-Symbolic Learner (FF-NSL), that integrates state-of-the-art ILP systems based on the Answer Set semantics, with neural networks, in order to learn interpretable hypotheses from labelled unstructured data. FF-NSL uses a pre-trained neural network to extract symbolic facts from unstructured data and an ILP system to learn a hypothesis that performs a downstream classification task. In order to evaluate the applicability of our approach to real-world applications, the framework is evaluated on tasks where distributional shifts are introduced to unstructured input data, for which pre-trained neural networks are likely to predict incorrectly and with high confidence. Experimental results show that FF-NSL outperforms baseline approaches such as a random forest and deep neural networks by learning more accurate and interpretable hypotheses with fewer examples.

医学相关(7篇)

【1】 Understanding the Spread of COVID-19 Epidemic: A Spatio-Temporal Point Process View 标题：认识冠状病毒流行的传播：时空点过程观

作者：Shuang Li,Lu Wang,Xinyun Chen,Yixiang Fang,Yan Song 机构：TheChineseUniversityofHongKong 链接：https://arxiv.org/abs/2106.13097 摘要：自1月21日在美国发现第一例冠状病毒病例以来，美国已有超过100万人确诊了COVID-19病例。这种传染性呼吸道疾病已在美国3000多个县和50个州迅速传播，并表现出进化集群和复杂的触发模式。了解该病复杂的时空交织传播过程，以便进行准确的预测或明智的外部干预是非常必要的。在本文中，我们将COVID-19的传播建模为时空点过程，并提出了一个无强度的生成模型来跟踪疾病的传播。我们进一步采用产生式对抗式模仿学习框架来学习模型参数。与传统的似然学习方法相比，该模拟学习框架不需要预先指定强度函数，减少了模型的误判。此外，对抗式学习过程绕过了似然评估中难以评估的积分，使得模型推理更具有数据和变量的可伸缩性。我们展示了美国COVID-19确诊病例的动态学习表现，并基于学习生成模型评估了社会距离策略。摘要：Since the first coronavirus case was identified in the U.S. on Jan. 21, more than 1 million people in the U.S. have confirmed cases of COVID-19. This infectious respiratory disease has spread rapidly across more than 3000 counties and 50 states in the U.S. and have exhibited evolutionary clustering and complex triggering patterns. It is essential to understand the complex spacetime intertwined propagation of this disease so that accurate prediction or smart external intervention can be carried out. In this paper, we model the propagation of the COVID-19 as spatio-temporal point processes and propose a generative and intensity-free model to track the spread of the disease. We further adopt a generative adversarial imitation learning framework to learn the model parameters. In comparison with the traditional likelihood-based learning methods, this imitation learning framework does not need to prespecify an intensity function, which alleviates the model-misspecification. Moreover, the adversarial learning procedure bypasses the difficult-to-evaluate integral involved in the likelihood evaluation, which makes the model inference more scalable with the data and variables. We showcase the dynamic learning performance on the COVID-19 confirmed cases in the U.S. and evaluate the social distancing policy based on the learned generative model.

【2】 Neural Networks for Dengue Prediction: A Systematic Review 标题：神经网络在登革热预测中的系统评价

作者：Kirstin Roster,Francisco A. Rodrigues 机构：Instituto de Ciˆencias Matem´aticas e de Computa¸c˜ao, Universidade de S˜ao Paulo, S˜ao Carlos, SP, Brazil. 备注：16 pages, 6 figures, 1 table 链接：https://arxiv.org/abs/2106.12905 摘要：由于缺乏治疗和通用疫苗，登革热的早期预测是疾病控制的重要工具。神经网络是一种强大的预测模型，对公共卫生的许多领域都做出了贡献。在这篇系统综述中，我们介绍了与登革热预测相关的神经网络，并回顾了它们在文献中的应用。目的是为未来的工作提供模型设计方面的信息。根据PRISMA指南，我们对使用神经网络预测人群登革热的研究进行了系统的搜索。我们总结了神经网络和比较模型的相关性能、模型结构和超参数以及输入特征的选择。包括19篇论文。大多数研究使用登革热历史发病率和气象输入特征来实现浅层神经网络。预测范围往往很短。基于神经网络的优势，大多数研究在城市或地方层面上使用粒度观测。神经网络相对于支持向量机等比较器的性能因研究环境而异。研究表明，神经网络可以很好地预测登革热，应纳入候选模型。卷积网络、循环网络或深度网络的使用相对来说还未被探索，但为进一步研究提供了有希望的途径，使用更广泛的输入功能（如社交媒体或移动电话数据）也是如此。摘要：Due to a lack of treatments and universal vaccine, early forecasts of Dengue are an important tool for disease control. Neural networks are powerful predictive models that have made contributions to many areas of public health. In this systematic review, we provide an introduction to the neural networks relevant to Dengue forecasting and review their applications in the literature. The objective is to help inform model design for future work. Following the PRISMA guidelines, we conduct a systematic search of studies that use neural networks to forecast Dengue in human populations. We summarize the relative performance of neural networks and comparator models, model architectures and hyper-parameters, as well as choices of input features. Nineteen papers were included. Most studies implement shallow neural networks using historical Dengue incidence and meteorological input features. Prediction horizons tend to be short. Building on the strengths of neural networks, most studies use granular observations at the city or sub-national level. Performance of neural networks relative to comparators such as Support Vector Machines varies across study contexts. The studies suggest that neural networks can provide good predictions of Dengue and should be included in the set of candidate models. The use of convolutional, recurrent, or deep networks is relatively unexplored but offers promising avenues for further research, as does the use of a broader set of input features such as social media or mobile phone data.

【3】 COVID-19 cases prediction using regression and novel SSM model for non-converged countries 标题：基于回归和新SSM模型的非融合国家冠状病毒病例预测

作者：Tushar Sarkar,Umang Patel,Rupali Patil 备注：None 链接：https://arxiv.org/abs/2106.12888 摘要：预测2019年新的冠状病毒疾病（COVID-19）相关或确诊病例的数量对于对抗和控制COVID-19爆发至关重要。从2020年1月20日到2020年7月21日收集了新的COVID-19相关病例。我们筛选出了正在融合的国家，并将这些国家用于训练网络。我们利用SARIMAX线性回归模型预测尚未收敛的国家新的疑似COVID-19病例。我们利用提出的统计SARIMAX模型（SSM）预测了非收敛国家的曲线。我们提出了新的基于信息调查的预测结果，可以帮助政府规划其未来的活动，并帮助临床管理部门为即将到来的事情做好更充分的准备。我们的框架可以利用线性回归预测日冕病例的峰值，R平方值为0.986，并在印度、美国和巴西等国家的不同水平上预测这种流行病的下降。我们发现，考虑更多的国家进行训练会降低预测过程，因为各国的约束条件各不相同。因此，我们期望这项工作中引用的结果将有助于个人更好地了解这种流行病的可能性。摘要：Anticipating the quantity of new associated or affirmed cases with novel coronavirus ailment 2019 (COVID-19) is critical in the counteraction and control of the COVID-19 flare-up. The new associated cases with COVID-19 information were gathered from 20 January 2020 to 21 July 2020. We filtered out the countries which are converging and used those for training the network. We utilized the SARIMAX, Linear regression model to anticipate new suspected COVID-19 cases for the countries which did not converge yet. We predict the curve of non-converged countries with the help of proposed Statistical SARIMAX model (SSM). We present new information investigation-based forecast results that can assist governments with planning their future activities and help clinical administrations to be more ready for what's to come. Our framework can foresee peak corona cases with an R-Squared value of 0.986 utilizing linear regression and fall of this pandemic at various levels for countries like India, US, and Brazil. We found that considering more countries for training degrades the prediction process as constraints vary from nation to nation. Thus, we expect that the outcomes referenced in this work will help individuals to better understand the possibilities of this pandemic.

【4】 Q-space Conditioned Translation Networks for Directional Synthesis of Diffusion Weighted Images from Multi-modal Structural MRI 标题：Q空间条件平移网络用于多模态结构磁共振扩散加权图像的定向合成

作者：Mengwei Ren,Heejong Kim,Neel Dey,Guido Gerig 机构：Department of Computer Science and Engineering, New York University, NY, USA 备注：Accepted by MICCAI 2021. Project page: this https URL; Code: this https URL 链接：https://arxiv.org/abs/2106.13188 摘要：目前用于弥散磁共振成像建模的深度学习方法通过直接从稀疏采样的弥散加权成像（DWI）预测微观结构指标来避免对密集采样弥散加权成像（DWI）的需要。然而，在训练和重构过程中，它们隐含着对静态$q$空间采样的不切实际的假设。此外，这种方法可以限制可变采样dwi的下游使用，用于包括微结构指数或纤维束成像的估计。我们提出了一个生成性对抗翻译框架，用于高质量的DWI合成，在给定常见结构图像（如B0、T1、T2）的情况下，使用任意$q$空间采样。我们的翻译网络以连续的$q$-空间信息为条件线性调整其内部表示，从而消除了对固定采样方案的需要。此外，这种方法能够从任意子采样的dwi中下游估计高质量的微结构图，这在稀疏采样dwi的情况下可能特别重要。在最近的几种方法中，所提出的方法提高了DWI合成的精度和保真度，并通过从合成图像中估计的标量微结构指数的精度来量化下游效用。代码位于https://github.com/mengweiren/q-space-conditioned-dwi-synthesis. 摘要：Current deep learning approaches for diffusion MRI modeling circumvent the need for densely-sampled diffusion-weighted images (DWIs) by directly predicting microstructural indices from sparsely-sampled DWIs. However, they implicitly make unrealistic assumptions of static $q$-space sampling during training and reconstruction. Further, such approaches can restrict downstream usage of variably sampled DWIs for usages including the estimation of microstructural indices or tractography. We propose a generative adversarial translation framework for high-quality DWI synthesis with arbitrary $q$-space sampling given commonly acquired structural images (e.g., B0, T1, T2). Our translation network linearly modulates its internal representations conditioned on continuous $q$-space information, thus removing the need for fixed sampling schemes. Moreover, this approach enables downstream estimation of high-quality microstructural maps from arbitrarily subsampled DWIs, which may be particularly important in cases with sparsely sampled DWIs. Across several recent methodologies, the proposed approach yields improved DWI synthesis accuracy and fidelity with enhanced downstream utility as quantified by the accuracy of scalar microstructure indices estimated from the synthesized images. Code is available at https://github.com/mengweiren/q-space-conditioned-dwi-synthesis.

【5】 VinDr-SpineXR: A deep learning framework for spinal lesions detection and classification from radiographs 标题：VinDR-SpineXR：一种用于从X线图像中检测和分类脊柱病变的深度学习框架

作者：Hieu T. Nguyen,Hieu H. Pham,Nghia T. Nguyen,Ha Q. Nguyen,Thang Q. Huynh,Minh Dao,Van Vu 机构： Medical Imaging Center, Vingroup Big Data Institute, Hanoi, Vietnam, School of Information and Communication Technology, Hanoi University of Science, and Technology, Hanoi, Vietnam, College of Engineering & Computer Science, VinUniversity, Hanoi, Vietnam 备注：This is a preprint of our paper which was accepted for publication by the International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI 2021) 链接：https://arxiv.org/abs/2106.12930 摘要：在临床实践中，x线片是鉴别脊柱异常最重要的影像学工具。然而，脊柱骨病变的评估对于放射科医生来说是一项具有挑战性的任务。这项工作的目的是开发和评估一个基于深度学习的框架，命名为VinDr SpineXR，用于脊柱X线异常的分类和定位。首先，我们建立了一个大的数据集，包括来自5000个研究的10468个脊柱X射线图像，每个图像都由一位经验丰富的放射科医生手动注释，并在13个类别的异常发现周围使用边界框。利用这个数据集，我们训练一个深度学习分类器来确定脊柱扫描是否异常，并训练一个检测器来定位13个重要发现中的7个。VinDr SpineXR是在1000个研究的2078个图像的测试集上进行评估的，这些图像与训练集是分开的。在图像级分类任务中，接收机工作特性曲线下面积（AUROC）为88.61%（95%CI为87.19%，90.02%），平均精度为100%(mAP@0.5)病灶定位任务占33.56%。这些结果作为概念的证明，并为今后这方面的研究奠定了基础。为了鼓励进步，数据集、代码和经过训练的深度学习模型被公开。摘要：Radiographs are used as the most important imaging tool for identifying spine anomalies in clinical practice. The evaluation of spinal bone lesions, however, is a challenging task for radiologists. This work aims at developing and evaluating a deep learning-based framework, named VinDr-SpineXR, for the classification and localization of abnormalities from spine X-rays. First, we build a large dataset, comprising 10,468 spine X-ray images from 5,000 studies, each of which is manually annotated by an experienced radiologist with bounding boxes around abnormal findings in 13 categories. Using this dataset, we then train a deep learning classifier to determine whether a spine scan is abnormal and a detector to localize 7 crucial findings amongst the total 13. The VinDr-SpineXR is evaluated on a test set of 2,078 images from 1,000 studies, which is kept separate from the training set. It demonstrates an area under the receiver operating characteristic curve (AUROC) of 88.61% (95% CI 87.19%, 90.02%) for the image-level classification task and a mean average precision (mAP@0.5) of 33.56% for the lesion-level localization task. These results serve as a proof of concept and set a baseline for future research in this direction. To encourage advances, the dataset, codes, and trained deep learning models are made publicly available.

【6】 A Systematic Collection of Medical Image Datasets for Deep Learning 标题：面向深度学习的医学图像数据集系统收集

作者：Johann Li,Guangming Zhu,Cong Hua,Mingtao Feng,BasheerBennamoun,Ping Li,Xiaoyuan Lu,Juan Song,Peiyi Shen,Xu Xu,Lin Mei,Liang Zhang,Syed Afaq Ali Shah,Mohammed Bennamoun 机构：Received: date Accepted: date 备注：This paper has been submitted to one journal 链接：https://arxiv.org/abs/2106.12864 摘要：人工智能在医疗保健等领域取得的惊人成功证明了人工智能可以达到与人类相似的性能。然而，成功总是伴随着挑战。深度学习算法依赖于数据，需要大量的数据集进行训练。医学影像领域数据的缺乏，给深度学习在医学影像分析中的应用带来了瓶颈。医学图像的获取、注释和分析是昂贵的，并且它们的使用受到伦理限制。它们还需要许多资源，例如人力专门知识和资金。这使得非医学研究人员很难获得有用的大型医学数据。因此，尽可能全面，本文提供了一个收集医学图像数据集及其相关的挑战，为深入学习研究。我们收集了大约三百个数据集和挑战的信息，这些数据集和挑战主要在2013年至2020年间报告，并将其分为四类：头颈部、胸腹部、病理学和血液学以及“其他”。我们的论文有三个目的：1）提供一个最新的、完整的列表，可以作为一个通用的参考，方便地找到临床图像分析的数据集；2）指导研究人员在方法学上测试和评估他们的方法在相关数据集上的性能和鲁棒性，3）提供相关医学主题相关算法的“路线”，并挑战排行榜。摘要：The astounding success made by artificial intelligence (AI) in healthcare and other fields proves that AI can achieve human-like performance. However, success always comes with challenges. Deep learning algorithms are data-dependent and require large datasets for training. The lack of data in the medical imaging field creates a bottleneck for the application of deep learning to medical image analysis. Medical image acquisition, annotation, and analysis are costly, and their usage is constrained by ethical restrictions. They also require many resources, such as human expertise and funding. That makes it difficult for non-medical researchers to have access to useful and large medical data. Thus, as comprehensive as possible, this paper provides a collection of medical image datasets with their associated challenges for deep learning research. We have collected information of around three hundred datasets and challenges mainly reported between 2013 and 2020 and categorized them into four categories: head & neck, chest & abdomen, pathology & blood, and ``others''. Our paper has three purposes: 1) to provide a most up to date and complete list that can be used as a universal reference to easily find the datasets for clinical image analysis, 2) to guide researchers on the methodology to test and evaluate their methods' performance and robustness on relevant datasets, 3) to provide a ``route'' to relevant algorithms for the relevant medical topics, and challenge leaderboards.

【7】 Multiclass Disease Predictions Based on Integrated Clinical and Genomics Datasets 标题：基于综合临床和基因组数据集的多类疾病预测

作者：Moeez M. Subhani,Ashiq Anjum 机构：College of Engineering and Technology, University of Derby, Derby, England 备注：None 链接：https://arxiv.org/abs/2006.07879 摘要：通过计算方法使用临床数据进行临床预测在生物信息学中很常见。然而，利用基因组数据集的信息进行临床预测并不是研究中经常观察到的现象。精确医学研究需要来自所有可用数据集的信息来提供智能的临床解决方案。在这篇论文中，我们试图建立一个预测模型，它使用来自临床和基因组数据集的信息。我们已经用机器学习方法证明了基于临床和基因组数据集的多类疾病预测。我们使用临床（ClinVar）和基因组学（gene expression）数据集创建了一个集成的数据集，并使用基于实例的学习者对其进行训练，以预测临床疾病。我们使用了一种创新但简单的方法进行多类分类，其中输出类的数量高达75。我们使用主成分分析进行特征选择。该分类器在综合数据集上预测疾病的准确率为73%。与其他分类模型相比，结果是一致的和有效的。结果表明，基因组学信息可以可靠地纳入临床预测数据集，在临床诊断和精密医学中具有重要的应用价值。摘要：Clinical predictions using clinical data by computational methods are common in bioinformatics. However, clinical predictions using information from genomics datasets as well is not a frequently observed phenomenon in research. Precision medicine research requires information from all available datasets to provide intelligent clinical solutions. In this paper, we have attempted to create a prediction model which uses information from both clinical and genomics datasets. We have demonstrated multiclass disease predictions based on combined clinical and genomics datasets using machine learning methods. We have created an integrated dataset, using a clinical (ClinVar) and a genomics (gene expression) dataset, and trained it using instance-based learner to predict clinical diseases. We have used an innovative but simple way for multiclass classification, where the number of output classes is as high as 75. We have used Principal Component Analysis for feature selection. The classifier predicted diseases with 73\% accuracy on the integrated dataset. The results were consistent and competent when compared with other classification models. The results show that genomics information can be reliably included in datasets for clinical predictions and it can prove to be valuable in clinical diagnostics and precision medicine.

蒸馏|知识提取(1篇)

【1】 Distilling the Knowledge from Normalizing Flows 标题：从规范化流程中提取知识

作者：Dmitry Baranchuk,Vladimir Aliev,Artem Babenko 机构： Russia 2National Research UniversityHigher School of Economics 备注：ICML'2021 Workshop: INNF+2021 (Spotlight) 链接：https://arxiv.org/abs/2106.12699 摘要：规范化流是一类强大的生成模型，在一些语音和视觉问题上表现出很强的性能。与其他生成模型相比，标准化流具有可处理的可能性，并允许进行稳定的训练。然而，它们必须经过精心设计，以有效的雅可比行列式计算来表示可逆函数。在实践中，这些需求导致过度参数化和复杂的体系结构，在推理时间和内存消耗方面不如其他前馈模型。在这项工作中，我们调查是否可以提取知识流为基础的模型，以更有效的替代品。我们提出了一种简单的提取方法，并在最新的基于条件流的图像超分辨率和语音合成模型上证明了该方法的有效性。摘要：Normalizing flows are a powerful class of generative models demonstrating strong performance in several speech and vision problems. In contrast to other generative models, normalizing flows have tractable likelihoods and allow for stable training. However, they have to be carefully designed to represent invertible functions with efficient Jacobian determinant calculation. In practice, these requirements lead to overparameterized and sophisticated architectures that are inferior to alternative feed-forward models in terms of inference time and memory consumption. In this work, we investigate whether one can distill knowledge from flow-based models to more efficient alternatives. We provide a positive answer to this question by proposing a simple distillation approach and demonstrating its effectiveness on state-of-the-art conditional flow-based models for image super-resolution and speech synthesis.

推荐(2篇)

【1】 RikoNet: A Novel Anime Recommendation Engine 标题：RikoNet：一种新颖的动漫推荐引擎

作者：Badal Soni,Debangan Thakuria,Nilutpal Nath,Navarun Das,Bhaskarananda Boro 机构：Received: DD Month YEAR Accepted: DD Month YEAR 备注：19 pages 链接：https://arxiv.org/abs/2106.12970 摘要：动漫在今天很受欢迎，尤其是在年轻一代中。随着各种类型节目的出现，越来越多的人越来越被娱乐业这个利基领域所吸引。由于动漫最近获得了主流的关注，我们对于用户的喜好和观看习惯还没有足够的信息。因此，为这种相对晦涩的娱乐媒体构建推荐引擎是一项艰巨的任务。在这个尝试中，我们建立了一个新的混合推荐系统，既可以作为推荐系统，也可以作为探索新的动漫类型和标题的手段。我们分析了该领域的总体趋势和用户的观看习惯，提出了有效的解决方案。我们的解决方案使用深度自动编码器来预测评级和生成嵌入。接下来，我们使用动画标题的嵌入来形成集群。这些簇形成了具有相似性的动画搜索空间，用于查找与用户喜欢或不喜欢的动画相似的动画。这种方法，结合预测评级，形成了新的混合滤波器。在本文中，我们演示了这一思想，并将我们实现的模型的性能与现有的最新技术进行了比较。摘要：Anime is quite well-received today, especially among the younger generations. With many genres of available shows, more and more people are increasingly getting attracted to this niche section of the entertainment industry. As anime has recently garnered mainstream attention, we have insufficient information regarding users' penchant and watching habits. Therefore, it is an uphill task to build a recommendation engine for this relatively obscure entertainment medium. In this attempt, we have built a novel hybrid recommendation system that could act both as a recommendation system and as a means of exploring new anime genres and titles. We have analyzed the general trends in this field and the users' watching habits for coming up with our efficacious solution. Our solution employs deep autoencoders for the tasks of predicting ratings and generating embeddings. Following this, we formed clusters using the embeddings of the anime titles. These clusters form the search space for anime with similarities and are used to find anime similar to the ones liked and disliked by the user. This method, combined with the predicted ratings, forms the novel hybrid filter. In this article, we have demonstrated this idea and compared the performance of our implemented model with the existing state-of-the-art techniques.

【2】 The Stereotyping Problem in Collaboratively Filtered Recommender Systems 标题：协同过滤推荐系统中的刻板印象问题

作者：Wenshuo Guo,Karl Krauth,Michael I. Jordan,Nikhil Garg 机构：Department of Electrical Engineering and Computer Sciences, Department of Statistics, University of California, Berkeley 链接：https://arxiv.org/abs/2106.12622 摘要：推荐系统——尤其是基于矩阵分解的协同过滤算法——在我们访问在线信息的过程中起着至关重要的作用。我们表明，这样的算法会导致一种特定的刻板印象：如果在一般用户群体中，对一组项目的偏好是反相关的，那么这些项目可能不会一起推荐给用户，而不管用户的偏好和评级历史。首先，我们引入\textit{joint accessibility}的概念，它度量用户可以联合访问一组项目的程度。然后研究了基于标准因子分解的协同过滤框架下的联合可达性，给出了联合可达性被破坏的理论充要条件。此外，我们还证明了当用户由单一特征向量表示时，这些条件很容易被违反。为了提高联合可访问性，我们进一步提出了一种替代的建模方法，该方法使用多向量表示来捕捉每个用户的多种兴趣。我们在真实数据集和模拟数据集上进行了广泛的实验，用标准的单向量矩阵分解模型证明了刻板印象问题。摘要：Recommender systems -- and especially matrix factorization-based collaborative filtering algorithms -- play a crucial role in mediating our access to online information. We show that such algorithms induce a particular kind of stereotyping: if preferences for a \textit{set} of items are anti-correlated in the general user population, then those items may not be recommended together to a user, regardless of that user's preferences and ratings history. First, we introduce a notion of \textit{joint accessibility}, which measures the extent to which a set of items can jointly be accessed by users. We then study joint accessibility under the standard factorization-based collaborative filtering framework, and provide theoretical necessary and sufficient conditions when joint accessibility is violated. Moreover, we show that these conditions can easily be violated when the users are represented by a single feature vector. To improve joint accessibility, we further propose an alternative modelling fix, which is designed to capture the diverse multiple interests of each user using a multi-vector representation. We conduct extensive experiments on real and simulated datasets, demonstrating the stereotyping problem with standard single-vector matrix factorization models.

聚类(2篇)

【1】 Differentially Private Algorithms for Clustering with Stability Assumptions 标题：具有稳定性假设的聚类的差分私有算法

作者：Moshe Shechner 备注：Thesis submitted in partial fulfillment of the requirements for the M.Sc. degree in the Faculty of Natural Sciences. arXiv admin note: text overlap with arXiv:1907.02513 by other authors 链接：https://arxiv.org/abs/2106.12959 摘要：研究了在输入稳定假设下的差异私有聚类问题。尽管关于不同隐私，特别是不同私有集群的研究数量不断增长，但只有三部著作（Nissim et al.2007，Wang et al.2015，Huang et al.2018）研究了私有集群“nice”k-means实例的问题，三者都依赖于样本和聚合框架，三者都依赖于真实聚类中心和私有算法返回的中心之间的Wasserstein距离。在这项工作中，我们改进了这一系列多轴工作。我们提出了一种简单得多的稳定输入聚类算法（不依赖于样本和聚合框架），并分析了它在Wasserstein距离和k-均值代价方面的效用。此外，我们的算法对于“nice”k-中值实例和差分隐私的局部模型具有直接的相似性。摘要：We study the problem of differentially private clustering under input-stability assumptions. Despite the ever-growing volume of works on differential privacy in general and differentially private clustering in particular, only three works (Nissim et al. 2007, Wang et al. 2015, Huang et al. 2018) looked at the problem of privately clustering "nice" k-means instances, all three relying on the sample-and-aggregate framework and all three measuring utility in terms of Wasserstein distance between the true cluster centers and the centers returned by the private algorithm. In this work we improve upon this line of works on multiple axes. We present a far simpler algorithm for clustering stable inputs (not relying on the sample-and-aggregate framework), and analyze its utility in both the Wasserstein distance and the k-means cost. Moreover, our algorithm has straight-forward analogues for "nice" k-median instances and for the local-model of differential privacy.

【2】 A review of systematic selection of clustering algorithms and their evaluation 标题：聚类算法的系统选择及其评价综述

作者：Marc Wegmann,Domenique Zipperling,Jonas Hillenbrand,Jürgen Fleischer 备注：for an interactive version visit this https URL 链接：https://arxiv.org/abs/2106.12792 摘要：数据分析在产业价值创造中起着不可或缺的作用。在这种情况下，聚类分析能够在很少或没有先验知识的情况下探索给定的数据集，并识别未知的模式。随着（大）数据复杂性在体积、种类和速度方面的增加，这变得更加重要。许多用于聚类分析的工具很早就被开发出来了，不同的聚类算法种类繁多。由于选择正确的聚类过程对数据分析的结果至关重要，用户在从原始数据中提取知识的过程中需要支持。因此，本文的目的在于为聚类算法确定一个系统的选择逻辑和相应的验证概念。其目标是使潜在用户能够选择一种最适合其需求和底层数据聚类问题特性的算法。此外，支持用户选择正确的验证概念来理解聚类结果。在文献综述的基础上，提出了聚类方法评价和验证概念选择的评价标准。这些准则适用于几种常见的算法，算法的选择过程通过引入基于伪代码的例程来支持，这些例程考虑了底层的数据结构。摘要：Data analysis plays an indispensable role for value creation in industry. Cluster analysis in this context is able to explore given datasets with little or no prior knowledge and to identify unknown patterns. As (big) data complexity increases in the dimensions volume, variety, and velocity, this becomes even more important. Many tools for cluster analysis have been developed from early on and the variety of different clustering algorithms is huge. As the selection of the right clustering procedure is crucial to the results of the data analysis, users are in need for support on their journey of extracting knowledge from raw data. Thus, the objective of this paper lies in the identification of a systematic selection logic for clustering algorithms and corresponding validation concepts. The goal is to enable potential users to choose an algorithm that fits best to their needs and the properties of their underlying data clustering problem. Moreover, users are supported in selecting the right validation concepts to make sense of the clustering results. Based on a comprehensive literature review, this paper provides assessment criteria for clustering method evaluation and validation concept selection. The criteria are applied to several common algorithms and the selection process of an algorithm is supported by the introduction of pseudocode-based routines that consider the underlying data structure.

自动驾驶|车辆|车道检测等(1篇)

【1】 Deep Learning for Network Traffic Classification 标题：深度学习在网络流量分类中的应用

作者：Niloofar Bayat,Weston Jackson,Derrick Liu 机构：Columbia University 链接：https://arxiv.org/abs/2106.12693 摘要：监控网络流量以识别内容、服务和应用是网络流量控制系统中一个活跃的研究课题。虽然现代防火墙提供了解密数据包的能力，但这对隐私倡导者来说并不吸引人。因此，从加密流量中识别任何信息都是一项具有挑战性的任务。尽管如此，以前的工作已经确定了可以实现应用程序和服务识别的机器学习方法。该过程包括从网络数据包中提取高层特征，然后训练一个鲁棒的机器学习分类器进行流量识别。我们提出了一种基于包、负载和到达时间序列的深度学习集成分类技术。据我们所知，这是首次将这种深度学习体系结构应用于服务器名称指示（SNI）分类问题。我们的集成模型击败了最先进的机器学习方法，我们的最新模型可以在github:\url上找到{https://github.com/niloofarbayat/NetworkClassification} 摘要：Monitoring network traffic to identify content, services, and applications is an active research topic in network traffic control systems. While modern firewalls provide the capability to decrypt packets, this is not appealing for privacy advocates. Hence, identifying any information from encrypted traffic is a challenging task. Nonetheless, previous work has identified machine learning methods that may enable application and service identification. The process involves high level feature extraction from network packet data then training a robust machine learning classifier for traffic identification. We propose a classification technique using an ensemble of deep learning architectures on packet, payload, and inter-arrival time sequences. To our knowledge, this is the first time such deep learning architectures have been applied to the Server Name Indication (SNI) classification problem. Our ensemble model beats the state of the art machine learning methods and our up-to-date model can be found on github: \url{https://github.com/niloofarbayat/NetworkClassification}

点云|SLAM|雷达|激光|深度RGBD相关(1篇)

【1】 Sparse Flows: Pruning Continuous-depth Models 标题：稀疏流：修剪连续深度模型

作者：Lucas Liebenwein,Ramin Hasani,Alexander Amini,Daniela Rus 机构：MIT CSAIL 链接：https://arxiv.org/abs/2106.12718 摘要：连续深度学习体系结构可以学习灵活的概率模型，用于预测建模（如神经常微分方程）和生成建模（如连续规范化流）。在这项工作中，我们设计了一个框架，通过删减这些连续深度模型的网络结构来破译它们的内部动态。我们的实验结果表明，剪枝可以提高生成模型中神经ODEs的泛化能力。此外，剪枝发现最小和有效的神经ODE表示与原始网络相比，参数减少98\%，而不损失准确性。最后，我们证明了应用剪枝可以获得关于设计更好的神经ODEs的有见地的信息，我们希望我们的结果能为进一步研究现代连续深度模型的性能-大小权衡提供动力。摘要：Continuous deep learning architectures enable learning of flexible probabilistic models for predictive modeling as neural ordinary differential equations (ODEs), and for generative modeling as continuous normalizing flows. In this work, we design a framework to decipher the internal dynamics of these continuous depth models by pruning their network architectures. Our empirical results suggest that pruning improves generalization for neural ODEs in generative modeling. Moreover, pruning finds minimal and efficient neural ODE representations with up to 98\% less parameters compared to the original network, without loss of accuracy. Finally, we show that by applying pruning we can obtain insightful information about the design of better neural ODEs.We hope our results will invigorate further research into the performance-size trade-offs of modern continuous-depth models.

联邦学习|隐私保护|加密(3篇)

【1】 Privacy Threats Analysis to Secure Federated Learning 标题：保护联合学习安全的隐私威胁分析

作者：Yuchen Li,Yifan Bao,Liyao Xiang,Junhan Liu,Cen Chen,Li Wang,Xinbing Wang 机构：Wang, Senior Member, IEEE 链接：https://arxiv.org/abs/2106.13076 摘要：联合学习是一种新兴的机器学习技术，它可以跨多个分散的参与方训练模型。它以保护隐私而闻名，因为数据永远不会离开计算设备，最近的方法通过隐藏以加密方式传输的消息来进一步增强其隐私性。然而，我们发现，尽管做出了努力，联邦学习仍然存在隐私威胁，因为它在不同方面具有互动性。分析了工业级安全计算联邦学习框架中的隐私威胁，揭示了线性回归、logistic回归和决策树等典型机器学习模型中广泛存在的隐私威胁。对于线性回归和logistic回归，我们通过理论分析表明，攻击者有可能在信息很少的情况下反转受害者的全部私人输入。对于决策树模型，我们发起一个攻击来推断受害者的私有输入的范围。所有攻击都在流行的联邦学习框架和真实数据集上进行评估。摘要：Federated learning is emerging as a machine learning technique that trains a model across multiple decentralized parties. It is renowned for preserving privacy as the data never leaves the computational devices, and recent approaches further enhance its privacy by hiding messages transferred in encryption. However, we found that despite the efforts, federated learning remains privacy-threatening, due to its interactive nature across different parties. In this paper, we analyze the privacy threats in industrial-level federated learning frameworks with secure computation, and reveal such threats widely exist in typical machine learning models such as linear regression, logistic regression and decision tree. For the linear and logistic regression, we show through theoretical analysis that it is possible for the attacker to invert the entire private input of the victim, given very few information. For the decision tree model, we launch an attack to infer the range of victim's private inputs. All attacks are evaluated on popular federated learning frameworks and real-world datasets.

【2】 Personalized Federated Learning with Clustered Generalization 标题：基于聚类泛化的个性化联邦学习

作者：Xueyang Tang,Song Guo,Jingcai Guo 机构：Department of Computing, Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China 链接：https://arxiv.org/abs/2106.13044 摘要：我们研究了最近出现的个性化联邦学习（PFL），旨在解决联邦学习（FL）环境中非I.I.D.数据的挑战性问题。PFL与传统FL的主要区别在于训练目标，其中PFL中的个性化模型通常在训练模型的个性化（通常来自局部模型）和泛化（通常来自全局模型）之间进行权衡。传统的FL方法由于其良好的全局和局部模型，很难达到这一目标。目前流行的PFL方法通常保持一个全局模型来指导局部模型的训练过程，并向局部模型传递适当的泛化程度。然而，当多个局部数据集存在丰富的统计差异时，单一的全局模型只能提供一个推广方向，甚至可能将负面影响转移到一些局部模型。根据我们的观察，大多数真实的或合成的数据分布通常有一定程度的聚集性，其中我们认为不同的泛化方向可以促进PFL。在本文中，我们提出了一个新的概念，称为聚类泛化，以应对FL中统计异质性的挑战。具体来说，我们在服务器中维护多个全局（泛化）模型，以与客户端中相应数量的本地模型簇相关联，并进一步将PFL问题描述为一个能够高效、稳健求解的双层优化问题。我们还进行了详细的理论分析，为光滑非凸目标的收敛性提供了保证。在合成数据集和真实数据集上的实验结果表明，我们的方法大大超过了现有的方法。摘要：We study the recent emerging personalized federated learning (PFL) that aims at dealing with the challenging problem of Non-I.I.D. data in the federated learning (FL) setting. The key difference between PFL and conventional FL lies in the training target, of which the personalized models in PFL usually pursue a trade-off between personalization (i.e., usually from local models) and generalization (i.e., usually from the global model) on trained models. Conventional FL methods can hardly meet this target because of their both well-developed global and local models. The prevalent PFL approaches usually maintain a global model to guide the training process of local models and transfer a proper degree of generalization to them. However, the sole global model can only provide one direction of generalization and may even transfer negative effects to some local models when rich statistical diversity exists across multiple local datasets. Based on our observation, most real or synthetic data distributions usually tend to be clustered to some degree, of which we argue different directions of generalization can facilitate the PFL. In this paper, we propose a novel concept called clustered generalization to handle the challenge of statistical heterogeneity in FL. Specifically, we maintain multiple global (generalized) models in the server to associate with the corresponding amount of local model clusters in clients, and further formulate the PFL as a bi-level optimization problem that can be solved efficiently and robustly. We also conduct detailed theoretical analysis and provide the convergence guarantee for the smooth non-convex objectives. Experimental results on both synthetic and real datasets show that our approach surpasses the state-of-the-art by a significant margin.

【3】 Low-Latency Federated Learning over Wireless Channels with Differential Privacy 标题：差分隐私无线信道下的低延时联合学习

作者：Kang Wei,Jun Li,Chuan Ma,Ming Ding,Cailian Chen,Shi Jin,Zhu Han,H. Vincent Poor 链接：https://arxiv.org/abs/2106.13039 摘要：在联邦学习（FL）中，模型训练分布在客户机上，本地模型由中央服务器聚合。由于不平衡的数据分布、对隐私保护的潜在需求和传输质量，上传模型在这种情况下的性能可能会有很大的不同。在本文中，我们的目标是最小化无线信道上的FL训练延迟，受整体训练性能以及每个客户端的差异隐私（DP）要求的约束。我们在多智能体多武装bandit（multi-agent multi-armed bandit，MAMAB）的框架下解决了这一问题，以解决多个客户端在不同的未知传输环境（如信道衰落和干扰）下的通信问题。具体地说，我们首先基于Lyapunov漂移技术将训练性能和每个客户端DP的长期约束转化为一个虚拟队列。然后，我们将MAMAB转化为一个最大-最小二部匹配问题，在每一轮的沟通，通过估计奖励的上限置信界（UCB）的方法。更重要的是，我们提出了两种有效的匹配方法，即改进的匈牙利算法和更好选择贪婪匹配（GMBA），其中，第一种方法可以以较高的复杂度获得最优解，而第二种方法通过在性能损失较小的情况下实现经验证的较低复杂度来达到更好的折衷。此外，我们发展了一个基于MAMAB的FL框架的期望后悔的上界，它显示了通信轮对数的线性增长，证明了它的理论可行性。大量的实验结果验证了本文提出的算法的有效性，并讨论了各种参数对无线边缘网络FL性能的影响。摘要：In federated learning (FL), model training is distributed over clients and local models are aggregated by a central server. The performance of uploaded models in such situations can vary widely due to imbalanced data distributions, potential demands on privacy protections, and quality of transmissions. In this paper, we aim to minimize FL training delay over wireless channels, constrained by overall training performance as well as each client's differential privacy (DP) requirement. We solve this problem in the framework of multi-agent multi-armed bandit (MAMAB) to deal with the situation where there are multiple clients confornting different unknown transmission environments, e.g., channel fading and interferences. Specifically, we first transform the long-term constraints on both training performance and each client's DP into a virtual queue based on the Lyapunov drift technique. Then, we convert the MAMAB to a max-min bipartite matching problem at each communication round, by estimating rewards with the upper confidence bound (UCB) approach. More importantly, we propose two efficient solutions to this matching problem, i.e., modified Hungarian algorithm and greedy matching with a better alternative (GMBA), in which the first one can achieve the optimal solution with a high complexity while the second one approaches a better trade-off by enabling a verified low-complexity with little performance loss. In addition, we develop an upper bound on the expected regret of this MAMAB based FL framework, which shows a linear growth over the logarithm of communication rounds, justifying its theoretical feasibility. Extensive experimental results are conducted to validate the effectiveness of our proposed algorithms, and the impacts of various parameters on the FL performance over wireless edge networks are also discussed.

推理|分析|理解|解释(8篇)

【1】 Towards Understanding and Mitigating Social Biases in Language Models 标题：语言模式中社会偏见的理解与缓解

作者：Paul Pu Liang,Chiyu Wu,Louis-Philippe Morency,Ruslan Salakhutdinov 机构： 20 19; 1Carnegie Mellon University 备注：ICML 2021, code available at this https URL 链接：https://arxiv.org/abs/2106.13219 摘要：随着机器学习方法在医疗保健、法律体系和社会科学等现实环境中的应用，认识到它们如何在这些敏感的决策过程中形成社会偏见和定型观念至关重要。在这样的现实世界部署中，大规模的预训练语言模型（LMs）在表现不受欢迎的代表性偏见方面具有潜在的危险性，这些偏见是由陈规定型观念造成的，这些陈规定型观念传播涉及性别、种族、宗教和其他社会结构的负面概括。作为提高LMs公平性的一个步骤，在提出新的基准和度量标准之前，我们仔细定义了代表性偏差的几个来源。利用这些工具，我们提出了减轻文本生成过程中社会偏见的步骤。我们的实验结果和人的评价表明，在保留高保真文本生成的关键上下文信息的同时，有效地减少了偏见，从而推动了性能公平帕累托前沿。摘要：As machine learning methods are deployed in real-world settings such as healthcare, legal systems, and social science, it is crucial to recognize how they shape social biases and stereotypes in these sensitive decision-making processes. Among such real-world deployments are large-scale pretrained language models (LMs) that can be potentially dangerous in manifesting undesirable representational biases - harmful biases resulting from stereotyping that propagate negative generalizations involving gender, race, religion, and other social constructs. As a step towards improving the fairness of LMs, we carefully define several sources of representational biases before proposing new benchmarks and metrics to measure them. With these tools, we propose steps towards mitigating social biases during text generation. Our empirical results and human evaluation demonstrate effectiveness in mitigating bias while retaining crucial contextual information for high-fidelity text generation, thereby pushing forward the performance-fairness Pareto frontier.

【2】 Software for Dataset-wide XAI: From Local Explanations to Global Insights with Zennit, CoRelAy, and ViRelAy 标题：数据集范围的XAI软件：使用Zennit、CoRelAy和ViRelAy从局部解释到全局洞察

作者：Christopher J. Anders,David Neumann,Wojciech Samek,Klaus-Robert Müller,Sebastian Lapuschkin 机构：Machine Learning Group, Technische Universit¨at Berlin, Berlin, Germany, BIFOLD – Berlin Institute for the Foundations of Learning and Data, Berlin, Germany, Department of Artificial Intelligence, Fraunhofer Heinrich Hertz Institute, Berlin, Germany 备注：10 pages, 3 figures 链接：https://arxiv.org/abs/2106.13200 摘要：深度神经网络（DNNs）是一种很强的预测工具，但其预测策略却鲜为人知。随着可解释人工智能的最新进展，探索这些复杂模型预测背后的推理的方法是可用的。其中一类方法是后自组织属性方法，其中分层相关传播（LRP）具有很高的性能。然而，试图理解DNN的推理往往停留在为输入空间中的单个样本获得的属性上，而没有触及进行更深入定量分析的可能性。由于没有正确工具的手动分析通常是不必要的劳动密集型，我们介绍了三个针对科学家的软件包，以探索使用归因方法和其他方法的模型推理：（1）Zennit-一个高度可定制和直观的归因框架，在PyTorch中实现LRP和相关方法，（2）CoRelAy-一个易于快速构建定量分析管道的框架，用于数据集范围内的解释分析；（3）ViRelAy-一个交互式探索数据、属性和分析结果的web应用程序。摘要：Deep Neural Networks (DNNs) are known to be strong predictors, but their prediction strategies can rarely be understood. With recent advances in Explainable Artificial Intelligence, approaches are available to explore the reasoning behind those complex models' predictions. One class of approaches are post-hoc attribution methods, among which Layer-wise Relevance Propagation (LRP) shows high performance. However, the attempt at understanding a DNN's reasoning often stops at the attributions obtained for individual samples in input space, leaving the potential for deeper quantitative analyses untouched. As a manual analysis without the right tools is often unnecessarily labor intensive, we introduce three software packages targeted at scientists to explore model reasoning using attribution approaches and beyond: (1) Zennit - a highly customizable and intuitive attribution framework implementing LRP and related approaches in PyTorch, (2) CoRelAy - a framework to easily and quickly construct quantitative analysis pipelines for dataset-wide analyses of explanations, and (3) ViRelAy - a web-application to interactively explore data, attributions, and analysis results.

【3】 Information Bottleneck: Exact Analysis of (Quantized) Neural Networks 标题：信息瓶颈：(量化)神经网络的精确分析

作者：Stephan Sloth Lorenzen,Christian Igel,Mads Nielsen 机构：University of Copenhagen 链接：https://arxiv.org/abs/2106.12912 摘要：信息瓶颈（IB）原理被认为是分析深层神经网络的一种方法。通过检测隐层与输入输出之间的互信息来研究学习动态。值得注意的是，在训练过程中有单独的拟合和压缩阶段的报道。这导致了一些争议，包括声称观察结果不可重复，并强烈依赖于使用的激活函数类型以及MI的估计方式。我们的研究证实，在计算MI时，不同的装箱方法会导致质的不同结果，无论是支持还是拒绝IB猜想。为了解决这一争议，我们研究了在MI是非平凡的并且可以精确计算的情况下的IB原理。我们监控量化神经网络的动态，也就是说，我们将整个深度学习系统离散化，这样在计算MI时就不需要近似。这使我们能够量化信息流而不产生测量误差。在这个设置中，我们观察到所有实验中的所有层的拟合阶段和输出层的压缩阶段；隐藏层中的压缩取决于激活函数的类型。我们的研究表明，在计算MI时，初始IB结果不是binning的伪影。然而，对于某些网络，压缩阶段可能无法观察到的关键观点也成立。摘要：The information bottleneck (IB) principle has been suggested as a way to analyze deep neural networks. The learning dynamics are studied by inspecting the mutual information (MI) between the hidden layers and the input and output. Notably, separate fitting and compression phases during training have been reported. This led to some controversy including claims that the observations are not reproducible and strongly dependent on the type of activation function used as well as on the way the MI is estimated. Our study confirms that different ways of binning when computing the MI lead to qualitatively different results, either supporting or refusing IB conjectures. To resolve the controversy, we study the IB principle in settings where MI is non-trivial and can be computed exactly. We monitor the dynamics of quantized neural networks, that is, we discretize the whole deep learning system so that no approximation is required when computing the MI. This allows us to quantify the information flow without measurement errors. In this setting, we observed a fitting phase for all layers and a compression phase for the output layer in all experiments; the compression in the hidden layers was dependent on the type of activation function. Our study shows that the initial IB results were not artifacts of binning when computing the MI. However, the critical claim that the compression phase may not be observed for some networks also holds true.

【4】 A comprehensive empirical analysis on cross-domain semantic enrichment for detection of depressive language 标题：跨领域语义丰富检测抑郁性语言的综合实证分析

作者：Nawshad Farruque,Randy Goebel,Osmar Zaiane 机构：Department of Computing Science, University of Alberta, Alberta, T,G ,E, Canada 备注：This is an extension over ECML-PKDD, 2019 paper "Augmenting Semantic Representation of Depressive Language: from Forums to Microblogs", with more embedding mapping/augmentation methods and data ablation tests. These experiments were done in the year 2019 链接：https://arxiv.org/abs/2106.12797 摘要：我们分析了在注释数据稀少的情况下，例如，在tweet的抑郁语言检测中，为学习任务设计的单词嵌入特征表示的创建过程。我们首先从一个大的通用数据集中预先训练的丰富的单词嵌入开始，然后通过一个简单的非线性映射机制从一个更小更具体的领域数据集中学习嵌入来增强它。我们还尝试了其他一些更复杂的映射方法，包括基于自动编码器和基于自定义损失函数的方法，这些方法通过逐渐学习接近语义相似的单词和远离语义不同的单词来学习嵌入表示。我们的强化表示更好地捕捉了抑郁症领域的语义，因为它结合了从特定领域学习到的语义和从一般语言中获得的单词覆盖率。我们还提出了一个简单的词袋模型，众所周知的情感和心理语言学词汇，以及一个一般的预先训练的词嵌入表示的比较性能分析。当使用多种不同的机器学习方法（包括抑郁Tweets识别任务中的深度学习模型）作为特征表示时，我们发现我们的增强词嵌入表示比其他方法获得了显著更好的F1分数，特别是当应用于高质量的数据集时。此外，我们还提供了一些数据消融试验，证实了我们的增强技术的有效性。摘要：We analyze the process of creating word embedding feature representations designed for a learning task when annotated data is scarce, for example, in depressive language detection from Tweets. We start with a rich word embedding pre-trained from a large general dataset, which is then augmented with embeddings learned from a much smaller and more specific domain dataset through a simple non-linear mapping mechanism. We also experimented with several other more sophisticated methods of such mapping including, several auto-encoder based and custom loss-function based methods that learn embedding representations through gradually learning to be close to the words of similar semantics and distant to dissimilar semantics. Our strengthened representations better capture the semantics of the depression domain, as it combines the semantics learned from the specific domain coupled with word coverage from the general language. We also present a comparative performance analyses of our word embedding representations with a simple bag-of-words model, well known sentiment and psycholinguistic lexicons, and a general pre-trained word embedding. When used as feature representations for several different machine learning methods, including deep learning models in a depressive Tweets identification task, we show that our augmented word embedding representations achieve a significantly better F1 score than the others, specially when applied to a high quality dataset. Also, we present several data ablation tests which confirm the efficacy of our augmentation techniques.

【5】 Finite-Sample Analysis of Off-Policy TD-Learning via Generalized Bellman Operators 标题：基于广义Bellman算子的非政策TD-学习的有限样本分析

作者：Zaiwei Chen,Siva Theja Maguluri,Sanjay Shakkottai,Karthikeyan Shanmugam 机构：edu†Georgia Institute of Technology, edu‡The University of Texas at Austin 链接：https://arxiv.org/abs/2106.12729 摘要：在时间差分（TD）学习中，非策略采样比策略采样更为实用，通过将学习与数据采集分离，实现了数据重用。众所周知，政策评估（包括多步非政策重要性抽样）具有求解广义Bellman方程的解释。本文给出了求解广义Bellman算子不动点的一般非策略类TD随机逼近算法的有限样本界。我们的关键步骤是证明广义Bellman算子同时是一个压缩映射，对于$[1，\infty）$中的每一个$\ell\u p$范数，具有一个公共压缩因子。非策略TD学习由于重要抽样比率的乘积而受到高方差的影响。文献中提出了许多算法（例如，$Q^\pi（\lambda）$、Tree Backup$（\lambda）$、Retrace$（\lambda）$和$Q$-trace）来解决这个问题。我们的结果立即暗示了这些算法的有限样本界。特别是，我们为$Q^\pi（\lambda）$、Tree Backup$（\lambda）$和Retrace$（\lambda）$提供了第一个已知的有限样本保证，并改进了[19]中$Q$-trace的最广为人知的界限。此外，我们还展示了每种算法的偏差-方差权衡。摘要：In temporal difference (TD) learning, off-policy sampling is known to be more practical than on-policy sampling, and by decoupling learning from data collection, it enables data reuse. It is known that policy evaluation (including multi-step off-policy importance sampling) has the interpretation of solving a generalized Bellman equation. In this paper, we derive finite-sample bounds for any general off-policy TD-like stochastic approximation algorithm that solves for the fixed-point of this generalized Bellman operator. Our key step is to show that the generalized Bellman operator is simultaneously a contraction mapping with respect to a weighted $\ell_p$-norm for each $p$ in $[1,\infty)$, with a common contraction factor. Off-policy TD-learning is known to suffer from high variance due to the product of importance sampling ratios. A number of algorithms (e.g. $Q^\pi(\lambda)$, Tree-Backup$(\lambda)$, Retrace$(\lambda)$, and $Q$-trace) have been proposed in the literature to address this issue. Our results immediately imply finite-sample bounds of these algorithms. In particular, we provide first-known finite-sample guarantees for $Q^\pi(\lambda)$, Tree-Backup$(\lambda)$, and Retrace$(\lambda)$, and improve the best known bounds of $Q$-trace in [19]. Moreover, we show the bias-variance trade-offs in each of these algorithms.

【6】 Reimagining GNN Explanations with ideas from Tabular Data 标题：从表格数据中重新构思GNN解释

作者：Anjali Singh,Shamanth R Nayak K,Balaji Ganesan 机构： welooked at very popular explainability techniques in tabular 1Manipal Institute of Technology 备注：4 pages, 8 figures, XAI Workshop at ICML 2021 链接：https://arxiv.org/abs/2106.12665 摘要：与基于表格数据训练的神经和决策树模型的解释相比，图形神经网络的解释技术还有很长的路要走。使用一个横跨图形和表格数据的任务，即实体匹配，我们对GNN模型解释中缺少的可解释性的关键方面进行了评论。摘要：Explainability techniques for Graph Neural Networks still have a long way to go compared to explanations available for both neural and decision decision tree-based models trained on tabular data. Using a task that straddles both graphs and tabular data, namely Entity Matching, we comment on key aspects of explainability that are missing in GNN model explanations.

【7】 Stock Market Analysis with Text Data: A Review 标题：基于文本数据的股市分析研究综述

作者：Kamaladdin Fataliyev,Aneesh Chivukula,Mukesh Prasad,Wei Liu 机构：School of Computer Science, University of Technology Sydney, Ultimo NSW , Sydney, Australia, Australian Artificial Intelligence Institute, University of Technology Sydney, Ultimo NSW , Sydney, Australia, A R T I C L E I N F O 链接：https://arxiv.org/abs/2106.12985 摘要：股市走势受通过新闻文章、公司报告和社交媒体讨论共享的公共和私人信息的影响。分析这些庞大的数据来源可以给市场参与者带来盈利的优势。然而，文献中的大多数研究都是基于传统的方法来分析非结构化的、大量的文本数据。在本研究中，我们回顾了大量现有的基于文本的股票市场分析文献。我们介绍了输入数据类型，并涵盖了主要的文本数据源和变体。然后介绍了特征表示技术。然后，我们介绍了分析技术，并创建了主要股市预测模型的分类。重要的是，我们将讨论分类法中每个类别的代表性工作，分析它们各自的贡献。最后，本文给出了未解决的开放性问题的研究结果，并对今后的工作提出了建议。本文的研究目的是综述目前主要的股票市场分析模型、金融市场预测的文本表示技术、现有技术的不足，并提出未来研究的方向。摘要：Stock market movements are influenced by public and private information shared through news articles, company reports, and social media discussions. Analyzing these vast sources of data can give market participants an edge to make profit. However, the majority of the studies in the literature are based on traditional approaches that come short in analyzing unstructured, vast textual data. In this study, we provide a review on the immense amount of existing literature of text-based stock market analysis. We present input data types and cover main textual data sources and variations. Feature representation techniques are then presented. Then, we cover the analysis techniques and create a taxonomy of the main stock market forecast models. Importantly, we discuss representative work in each category of the taxonomy, analyzing their respective contributions. Finally, this paper shows the findings on unaddressed open problems and gives suggestions for future work. The aim of this study is to survey the main stock market analysis models, text representation techniques for financial market prediction, shortcomings of existing techniques, and propose promising directions for future research.

【8】 Understanding Modern Techniques in Optimization: Frank-Wolfe, Nesterov's Momentum, and Polyak's Momentum 标题：理解最优化中的现代技术：弗兰克-沃尔夫、内斯特罗夫的动量和波利亚克的动量

作者：Jun-Kun Wang 机构：In Partial Fulfillment, of the Requirements for the Degree, Doctor of Philosophy in the, School of Computer Science, College of Computing, Georgia Institute of Technology, arXiv:,.,v, [math.OC] , Jun 备注：PhD dissertation at Georgia Tech. arXiv admin note: text overlap with arXiv:2010.01618 链接：https://arxiv.org/abs/2106.12923 摘要：在本文的第一部分研究中，我们开发了一个模块化的框架，可以作为构造和分析凸优化迭代算法的方法。具体地说，我们的工作将优化转换为迭代地玩一个两人零和游戏。许多现有的优化算法，包括Frank-Wolfe和Nesterov的加速方法，都可以通过让两个在线学习者用适当的策略进行对抗来从游戏中恢复。此外，博弈双方的加权平均遗憾之和意味着收敛速度。因此，我们的方法为这些算法提供了简单的替代证明。此外，我们证明了我们的优化方法作为一个迭代的游戏导致了三个新的快速Frank-Wolfe类算法的一些约束集，这进一步表明我们的框架确实是通用的，模块化的，易于使用。在第二部分中，我们利用Polyak动量对某些问题的可证加速度进行了模块化分析，包括求解经典的强二次凸问题，在神经切线核域下训练宽ReLU网络，以及训练具有正交初始化的深线性网络。我们发展了一个元定理，并证明当应用Polyak动量来解决这些问题时，诱导动力学表现出一种形式，我们可以直接应用我们的元定理。在论文的最后一部分，我们展示了利用Polyak动量的另一个优点——它有助于光滑非凸优化中鞍点的快速逃逸。这个结果和第二部分的结果一起，为Polyak在现代非凸优化和深度学习中的应用提供了新的思路。摘要：In the first part of this dissertation research, we develop a modular framework that can serve as a recipe for constructing and analyzing iterative algorithms for convex optimization. Specifically, our work casts optimization as iteratively playing a two-player zero-sum game. Many existing optimization algorithms including Frank-Wolfe and Nesterov's acceleration methods can be recovered from the game by pitting two online learners with appropriate strategies against each other. Furthermore, the sum of the weighted average regrets of the players in the game implies the convergence rate. As a result, our approach provides simple alternative proofs to these algorithms. Moreover, we demonstrate that our approach of optimization as iteratively playing a game leads to three new fast Frank-Wolfe-like algorithms for some constraint sets, which further shows that our framework is indeed generic, modular, and easy-to-use. In the second part, we develop a modular analysis of provable acceleration via Polyak's momentum for certain problems, which include solving the classical strongly quadratic convex problems, training a wide ReLU network under the neural tangent kernel regime, and training a deep linear network with an orthogonal initialization. We develop a meta theorem and show that when applying Polyak's momentum for these problems, the induced dynamics exhibit a form where we can directly apply our meta theorem. In the last part of the dissertation, we show another advantage of the use of Polyak's momentum -- it facilitates fast saddle point escape in smooth non-convex optimization. This result, together with those of the second part, sheds new light on Polyak's momentum in modern non-convex optimization and deep learning.

检测相关(7篇)

【1】 Real-time Spatio-temporal Event Detection on Geotagged Social Media 标题：地理标记社交媒体上的实时时空事件检测

作者：Yasmeen George,Shanika Karunasekera,Aaron Harwood,Kwan Hui Lim 机构：Correspondence:, School of Computing and, Information Systems, The, University of Melbourne, Australia, Full list of author information is, available at the end of the, article 备注：Accepted to Journal of Big Data 链接：https://arxiv.org/abs/2106.13121 摘要：挖掘社交媒体数据流的一个关键挑战是识别特定地区或全球范围内一群人积极讨论的事件。此类事件对于事故、抗议、选举或突发新闻的预警很有用。然而，事件列表以及事件时间和空间的分辨率都不是固定的，也不是预先知道的。在这项工作中，我们提出了一个在线时空事件检测系统使用社会媒体，能够检测事件在不同的时间和空间分辨率。首先，针对事件空间分辨率未知的问题，利用四叉树方法，根据社交媒体数据的密度将地理空间划分为多尺度区域。然后，采用泊松分布的统计无监督方法和平滑方法来突出显示具有意外社会职位密度的区域。此外，通过以连续的时间间隔合并在同一区域中发生的事件来精确地估计事件持续时间。引入了后处理阶段来过滤垃圾邮件、假事件或错误事件。最后，我们通过使用社交媒体实体来评估检测到的事件的完整性和准确性，从而合并了简单的语义。该方法使用不同的社交媒体数据集（Twitter和Flickr）对墨尔本、伦敦、巴黎和纽约的不同城市进行评估。为了验证该方法的有效性，我们将我们的结果与基于固定地理空间分割和聚类的两种基线算法进行了比较。对于性能评估，我们手动计算查全率和查准率。我们还提出了一种新的质量度量方法，称为强度指数（strength index），它可以自动度量所报告事件的准确性。摘要：A key challenge in mining social media data streams is to identify events which are actively discussed by a group of people in a specific local or global area. Such events are useful for early warning for accident, protest, election or breaking news. However, neither the list of events nor the resolution of both event time and space is fixed or known beforehand. In this work, we propose an online spatio-temporal event detection system using social media that is able to detect events at different time and space resolutions. First, to address the challenge related to the unknown spatial resolution of events, a quad-tree method is exploited in order to split the geographical space into multiscale regions based on the density of social media data. Then, a statistical unsupervised approach is performed that involves Poisson distribution and a smoothing method for highlighting regions with unexpected density of social posts. Further, event duration is precisely estimated by merging events happening in the same region at consecutive time intervals. A post processing stage is introduced to filter out events that are spam, fake or wrong. Finally, we incorporate simple semantics by using social media entities to assess the integrity, and accuracy of detected events. The proposed method is evaluated using different social media datasets: Twitter and Flickr for different cities: Melbourne, London, Paris and New York. To verify the effectiveness of the proposed method, we compare our results with two baseline algorithms based on fixed split of geographical space and clustering method. For performance evaluation, we manually compute recall and precision. We also propose a new quality measure named strength index, which automatically measures how accurate the reported event is.

【2】 InFlow: Robust outlier detection utilizing Normalizing Flows 标题：流入：利用归一化流进行稳健的离群值检测

作者：Nishant Kumar,Pia Hanfeld,Michael Hecht,Michael Bussmann,Stefan Gumhold,Nico Hoffmannn 机构：HZDR, Dresden, Germany, CASUS, Görlitz, Germany, CGV, TU Dresden, Nico Hoffmann 链接：https://arxiv.org/abs/2106.12894 摘要：规范化流是一种重要的深层生成模型，它提供了易于处理的概率分布和有效的密度估计。然而，众所周知，它们在检测非分布（OOD）输入时失败，因为它们直接在其潜在空间中编码输入表示的局部特征。在本文中，我们通过证明流如果通过注意机制扩展，可以可靠地检测异常值（包括对抗性攻击），来解决规范化流的过度自信问题。我们的方法不需要异常数据进行训练，我们通过在不同的实验环境中报告最先进的性能来展示我们的OOD检测方法的效率。代码位于https://github.com/ComputationalRadiationPhysics/InFlow . 摘要：Normalizing flows are prominent deep generative models that provide tractable probability distributions and efficient density estimation. However, they are well known to fail while detecting Out-of-Distribution (OOD) inputs as they directly encode the local features of the input representations in their latent space. In this paper, we solve this overconfidence issue of normalizing flows by demonstrating that flows, if extended by an attention mechanism, can reliably detect outliers including adversarial attacks. Our approach does not require outlier data for training and we showcase the efficiency of our method for OOD detection by reporting state-of-the-art performance in diverse experimental settings. Code available at https://github.com/ComputationalRadiationPhysics/InFlow .

【3】 Partial Wasserstein and Maximum Mean Discrepancy distances for bridging the gap between outlier detection and drift detection 标题：用于弥合孤立点检测和漂移检测之间差距的部分Wasserstein和最大平均差异距离

作者：Thomas Viehmann 链接：https://arxiv.org/abs/2106.12893 摘要：随着机器学习和基于深度学习的应用在实践中的兴起，监控，即验证这些操作是否符合规范，已经成为一个重要的实际问题。这种监控的一个重要方面是检查输入（或中间产物）是否偏离了其验证的分布，这可能会使测试期间获得的性能保证失效。有两种常见的方法。更经典的可能是异常值检测或新颖性检测，其中，对于单个输入，我们询问它是否是异常值，即极不可能来自参考分布。第二种，也许是最近的方法，是考虑更多的输入，并将其分布与参考分布（例如，在测试期间取样）进行比较。这是在标签漂移检测下完成的。在这项工作中，我们通过比较给定数量的输入和自动选择的参考分布部分，来弥补离群点检测和漂移检测之间的差距。摘要：With the rise of machine learning and deep learning based applications in practice, monitoring, i.e. verifying that these operate within specification, has become an important practical problem. An important aspect of this monitoring is to check whether the inputs (or intermediates) have strayed from the distribution they were validated for, which can void the performance assurances obtained during testing. There are two common approaches for this. The, perhaps, more classical one is outlier detection or novelty detection, where, for a single input we ask whether it is an outlier, i.e. exceedingly unlikely to have originated from a reference distribution. The second, perhaps more recent approach, is to consider a larger number of inputs and compare its distribution to a reference distribution (e.g. sampled during testing). This is done under the label drift detection. In this work, we bridge the gap between outlier detection and drift detection through comparing a given number of inputs to an automatically chosen part of the reference distribution.

【4】 DCoM: A Deep Column Mapper for Semantic Data Type Detection 标题：DCOM：一种用于语义数据类型检测的深列映射器

作者：Subhadip Maji,Swapna Sourav Rout,Sudeep Choudhary 机构：Optum Global Solutions, Bangalore, India , Senior Data Scientist 备注：9 pages, 2 figures, 7 tables 链接：https://arxiv.org/abs/2106.12871 摘要：语义数据类型检测是数据科学中一项非常重要的任务，可以实现数据的自动清洗、模式匹配、数据发现、语义数据类型规范化和敏感数据识别。现有的方法包括基于正则表达式或基于字典查找的方法，这些方法对脏数据和看不见的数据不具有鲁棒性，并且只能预测非常少的语义数据类型。现有的机器学习方法从数据中提取大量的工程特征，建立logistic回归、随机森林或前馈神经网络。在本文中，我们引入了DCoM（一种基于多输入NLP的深度神经网络）来检测语义数据类型，它不是从数据中提取大量的特征，而是将列（或实例）的原始值作为文本输入到模型中。我们在从VizNet语料库中提取的686765个数据列上训练DCoM，这些数据列包含78种不同的语义数据类型。在同一数据集上，DCoM的性能优于其他当代结果，具有相当大的优势。摘要：Detection of semantic data types is a very crucial task in data science for automated data cleaning, schema matching, data discovery, semantic data type normalization and sensitive data identification. Existing methods include regular expression-based or dictionary lookup-based methods that are not robust to dirty as well unseen data and are limited to a very less number of semantic data types to predict. Existing Machine Learning methods extract large number of engineered features from data and build logistic regression, random forest or feedforward neural network for this purpose. In this paper, we introduce DCoM, a collection of multi-input NLP-based deep neural networks to detect semantic data types where instead of extracting large number of features from the data, we feed the raw values of columns (or instances) to the model as texts. We train DCoM on 686,765 data columns extracted from VizNet corpus with 78 different semantic data types. DCoM outperforms other contemporary results with a quite significant margin on the same dataset.

【5】 DeepAuditor: Distributed Online Intrusion Detection System for IoT devices via Power Side-channel Auditing 标题：DeepAuditor：基于电源侧通道审计的分布式物联网设备在线入侵检测系统

作者：Woosub Jung,Yizhou Feng,Sabbir Ahmed Khan,Chunsheng Xin,Danella Zhao,Gang Zhou 机构：Department of Computer Science, William & Mary, Department of Computer Science, Old Dominion University 链接：https://arxiv.org/abs/2106.12753 摘要：随着物联网设备数量的迅速增加，物联网僵尸网络利用了物联网设备的漏洞。然而，在大规模攻击之前检测物联网设备上的初始入侵仍然是一个挑战。最近的研究利用电力侧信道信息来描述物联网设备上的这种入侵行为，但仍然缺乏实时检测方法。本研究旨在设计一个基于能量审计的物联网设备在线入侵检测系统DeepAuditor。为了实现系统的实时性，我们首先提出了一种轻量级的电源审计设备powerauditor。利用Power-Auditor，我们开发了一个分布式CNN分类器，用于实验室环境下的在线推理。为了保护数据泄漏和减少网络冗余，我们还提出了基于压缩同态加密的隐私保护推理协议和滑动窗口协议。分类准确度和处理时间是在我们的实验室设置测量。我们还证明了分布式CNN设计对任何分布式组件都是安全的。总的来说，这些测量结果证明了我们的实时分布式系统在物联网设备上进行入侵检测的可行性。摘要：As the number of IoT devices has increased rapidly, IoT botnets have exploited the vulnerabilities of IoT devices. However, it is still challenging to detect the initial intrusion on IoT devices prior to massive attacks. Recent studies have utilized power side-channel information to characterize this intrusion behavior on IoT devices but still lack real-time detection approaches. This study aimed to design an online intrusion detection system called DeepAuditor for IoT devices via power auditing. To realize the real-time system, we first proposed a lightweight power auditing device called Power Auditor. With the Power Auditor, we developed a Distributed CNN classifier for online inference in our laboratory setting. In order to protect data leakage and reduce networking redundancy, we also proposed a privacy-preserved inference protocol via Packed Homomorphic Encryption and a sliding window protocol in our system. The classification accuracy and processing time were measured in our laboratory settings. We also demonstrated that the distributed CNN design is secure against any distributed components. Overall, the measurements were shown to the feasibility of our real-time distributed system for intrusion detection on IoT devices.

【6】 Deep Fake Detection: Survey of Facial Manipulation Detection Solutions 标题：深度伪装检测：面部操作检测解决方案综述

作者：Samay Pashine,Sagar Mandiya,Praveen Gupta,Rashid Sheikh 机构：Dept. of Computer Science, AITR, Indore, India 备注：None 链接：https://arxiv.org/abs/2106.12605 摘要：作为一个领域，深度学习已经成功地用于解决大量复杂的问题，这种问题是几十年前我们无法想象的。但是，尽管它带来了很多好处，但仍然有一些方法可以用来给我们的社会带来危害。深度造假已经被证明是一个这样的问题，现在比以往任何时候，当任何个人都可以创造一个假图像或视频简单地使用智能手机上的应用程序，需要有一些对策，利用该方法可以检测出图像或视频是真是假，并对威胁网络信息可信度的问题进行处理。虽然由神经网络产生的深赝品看起来像真实的图像或视频一样真实，但经过适度处理后仍会留下时空痕迹或特征，这些特征在人眼看不见的情况下，可以借助专门从事深赝品检测的神经网络进行检测。本文分析了几种最先进的神经网络（MesoNet、ResNet-50、VGG-19和exception-Net），并对它们进行了比较，为各种场景寻找最佳解决方案，如实时深度假检测，部署在在线社交媒体平台上，分类应尽可能快，或为小型通讯社，分类不需要实时，但需要最大的准确性。摘要：Deep Learning as a field has been successfully used to solve a plethora of complex problems, the likes of which we could not have imagined a few decades back. But as many benefits as it brings, there are still ways in which it can be used to bring harm to our society. Deep fakes have been proven to be one such problem, and now more than ever, when any individual can create a fake image or video simply using an application on the smartphone, there need to be some countermeasures, with which we can detect if the image or video is a fake or real and dispose of the problem threatening the trustworthiness of online information. Although the Deep fakes created by neural networks, may seem to be as real as a real image or video, it still leaves behind spatial and temporal traces or signatures after moderation, these signatures while being invisible to a human eye can be detected with the help of a neural network trained to specialize in Deep fake detection. In this paper, we analyze several such states of the art neural networks (MesoNet, ResNet-50, VGG-19, and Xception Net) and compare them against each other, to find an optimal solution for various scenarios like real-time deep fake detection to be deployed in online social media platforms where the classification should be made as fast as possible or for a small news agency where the classification need not be in real-time but requires utmost accuracy.

【7】 Artifact Detection and Correction in EEG data: A Review 标题：脑电数据伪影检测与校正研究进展

作者：S Sadiya,T Alhanai,MM Ghassemi 备注：None 链接：https://arxiv.org/abs/2106.13081 摘要：脑电图（EEG）在许多领域有着无数的应用。然而，脑电的应用受到低信噪比的限制。多种类型的伪影导致了脑电信号的噪声，许多技术被提出来检测和校正这些伪影。这些技术的范围从简单地检测和拒绝伪影充斥的部分，从脑电信号提取噪声成分。在这篇文章中，我们回顾了各种最近和经典的脑电数据伪影检测和校正技术，重点放在过去的五年。我们比较了这些方法的优点和缺点，并提出了该领域未来的发展方向。摘要：Electroencephalography (EEG) has countless applications across many of fields. However, EEG applications are limited by low signal-to-noise ratios. Multiple types of artifacts contribute to the noisiness of EEG, and many techniques have been proposed to detect and correct these artifacts. These techniques range from simply detecting and rejecting artifact ridden segments, to extracting the noise component from the EEG signal. In this paper we review a variety of recent and classical techniques for EEG data artifact detection and correction with a focus on the last half-decade. We compare the strengths and weaknesses of the approaches and conclude with proposed future directions for the field.

分类|识别(6篇)

【1】 High Performance Hyperspectral Image Classification using Graphics Processing Units 标题：基于图形处理单元的高性能高光谱图像分类

作者：Mahmoud Hossam 机构： Teaching assistant at Basic Science Department Faculty of Computer and Information Sciences Ain Shams University Under the Supervision of Prof 备注：Master Thesis, Ain Shams University 链接：https://arxiv.org/abs/2106.12942 摘要：实时遥感应用，如搜索和救援任务，军事目标探测，环境监测，灾害预防和其他时间关键的应用需要机载实时处理能力或自主决策。一些无人遥控系统，如卫星，在物理上远离其运营商，所有航天器的控制和航天器返回的数据必须通过无线无线电链路传输。当卫星离开地面站的视线时，这条链路可能长时间不可用。因此，对于机载实时处理系统来说，重量轻、体积小、功耗低的硬件是必不可少的。随着近年来高光谱成像传感器的维数、尺寸和分辨率的不断提高，对遥感处理系统提出了新的挑战，需要更强大的计算体系结构。图形处理单元（gpu）是一种很有前途的轻量级高性能计算体系结构，可以满足机载系统的这些计算需求。本研究的目的是建立高性能的星载高光谱分析方法。我们提出了著名的递归层次分割（RHSEG）聚类方法的加速方法，使用GPU、带GPU的混合多核CPU和混合多核CPU/GPU集群。RHSEG是由美国国家航空航天局（NASA）开发的一种方法，其目的是提供具有多个输出级别的丰富分类信息。与CPU顺序实现相比，并行解决方案实现的加速比是并行单GPU的21倍，混合多节点计算机集群（具有16个计算节点）的240倍。与同等的并行CPU集群相比，使用单个GPU可以将能耗降低到74%。摘要：Real-time remote sensing applications like search and rescue missions, military target detection, environmental monitoring, hazard prevention and other time-critical applications require onboard real time processing capabilities or autonomous decision making. Some unmanned remote systems like satellites are physically remote from their operators, and all control of the spacecraft and data returned by the spacecraft must be transmitted over a wireless radio link. This link may not be available for extended periods when the satellite is out of line of sight of its ground station. Therefore, lightweight, small size and low power consumption hardware is essential for onboard real time processing systems. With increasing dimensionality, size and resolution of recent hyperspectral imaging sensors, additional challenges are posed upon remote sensing processing systems and more capable computing architectures are needed. Graphical Processing Units (GPUs) emerged as promising architecture for light weight high performance computing that can address these computational requirements for onboard systems. The goal of this study is to build high performance methods for onboard hyperspectral analysis. We propose accelerated methods for the well-known recursive hierarchical segmentation (RHSEG) clustering method, using GPUs, hybrid multicore CPU with a GPU and hybrid multi-core CPU/GPU clusters. RHSEG is a method developed by the National Aeronautics and Space Administration (NASA), which is designed to provide rich classification information with several output levels. The achieved speedups by parallel solutions compared to CPU sequential implementations are 21x for parallel single GPU and 240x for hybrid multi-node computer clusters with 16 computing nodes. The energy consumption is reduced to 74% using a single GPU compared to the equivalent parallel CPU cluster.

【2】 Evaluation of Representation Models for Text Classification with AutoML Tools 标题：AutoML工具对文本分类表示模型的评价

作者：Sebastian Brändle,Marc Hanussek,Matthias Blohm,Maximilien Kintz 机构： University of Stuttgart IAT, Institute of Human Factors and Technology, Management, Stuttgart, Germany 备注：Accecpted for Future Technologies Conference 2021 链接：https://arxiv.org/abs/2106.12798 摘要：近年来，自动机器学习（AutoML）在表格数据方面取得了越来越大的成功。然而，处理非结构化数据（如文本）是一个挑战，而且开源AutoML工具并不广泛支持。本文比较了三种手动创建的文本表示和AutoML工具自动创建的文本嵌入。我们的基准测试包括四个流行的开源AutoML工具和八个用于文本分类的数据集。结果表明，直接的文本表示比自动创建文本嵌入的AutoML工具性能更好。摘要：Automated Machine Learning (AutoML) has gained increasing success on tabular data in recent years. However, processing unstructured data like text is a challenge and not widely supported by open-source AutoML tools. This work compares three manually created text representations and text embeddings automatically created by AutoML tools. Our benchmark includes four popular open-source AutoML tools and eight datasets for text classification purposes. The results show that straightforward text representations perform better than AutoML tools with automatically created text embeddings.

【3】 Frequency Domain Convolutional Neural Network: Accelerated CNN for Large Diabetic Retinopathy Image Classification 标题：频域卷积神经网络：用于大型糖尿病视网膜病变图像分类的加速CNN

作者：Ee Fey Goh,ZhiYuan Chen,Wei Xiang Lim 机构：a University of Nottingham Malaysia, School of Computer Science, Jln Broga, Semenyih, Selangor 备注：This paper has been submitted to Neurocomputing 链接：https://arxiv.org/abs/2106.12736 摘要：卷积神经网络（CNNs）中的传统空间卷积层在训练时间可能需要数天的点上计算昂贵，除非减少层的数量、训练图像的数量或训练图像的大小。256x256像素的图像大小通常用于CNN的大多数应用，但是这种图像大小对于像糖尿病视网膜病变（DR）分类这样的应用来说太小了，在这些应用中，图像细节对于准确分类非常重要。本研究提出了频域卷积（FDC）和频域池（FDP）层，用RFFT、核初始化策略、卷积伪影去除和信道无关卷积（CIC）来代替传统的卷积和池层。利用FDC层和FDP层建立频域卷积神经网络（FDCNN），加速大图像的训练，用于DR分类。全FDC层是FDC层的扩展，允许在传统cnn中直接使用，它还用于修改VGG16体系结构。FDCNN比同等CNN结构至少快54.21%，内存效率高70.74%。与原始VGG16结构相比，改进的VGG16结构具有较短的训练时间和95.63%的准确率。摘要：The conventional spatial convolution layers in the Convolutional Neural Networks (CNNs) are computationally expensive at the point where the training time could take days unless the number of layers, the number of training images or the size of the training images are reduced. The image size of 256x256 pixels is commonly used for most of the applications of CNN, but this image size is too small for applications like Diabetic Retinopathy (DR) classification where the image details are important for accurate classification. This research proposed Frequency Domain Convolution (FDC) and Frequency Domain Pooling (FDP) layers which were built with RFFT, kernel initialization strategy, convolution artifact removal and Channel Independent Convolution (CIC) to replace the conventional convolution and pooling layers. The FDC and FDP layers are used to build a Frequency Domain Convolutional Neural Network (FDCNN) to accelerate the training of large images for DR classification. The Full FDC layer is an extension of the FDC layer to allow direct use in conventional CNNs, it is also used to modify the VGG16 architecture. FDCNN is shown to be at least 54.21% faster and 70.74% more memory efficient compared to an equivalent CNN architecture. The modified VGG16 architecture with Full FDC layer is reported to achieve a shorter training time and a higher accuracy at 95.63% compared to the original VGG16 architecture for DR classification.

【4】 Alternative Microfoundations for Strategic Classification 标题：战略分类的另类微观基础

作者：Meena Jagadeesan,Celestine Mendler-Dünner,Moritz Hardt 机构：University of California, Berkeley 备注：Accepted for publication at ICML 2021 链接：https://arxiv.org/abs/2106.12705 摘要：当在机器学习环境下对战略行为进行推理时，很容易将理性主体的标准微观基础与分类的统计决策理论结合起来。在这项工作中，我们认为，直接结合这些标准成分导致脆性解决方案的概念有限的描述性和规定性的价值。首先，我们证明了具有完美信息的理性主体在对决策规则的总体反应中产生了不连续性，而我们通常不会从经验上观察到这种不连续性。第二，当代理的任何正部分都不是完美的策略时，理想的稳定点（分类器对于它所包含的数据是最优的）就不存在了。第三，在标准微观基础下的最优决策规则，在一系列关于代理人行为的可能假设中，使负外部性（即社会负担）的测度最大化。认识到这些局限性，我们探索替代标准的二元分类微基础。我们从描述一组需求开始，这些需求有助于导航关于代理如何响应决策规则的可能假设空间。特别地，我们分析了特征操作的自然约束，并讨论了足以保证稳定点鲁棒存在的性质。在此基础上，我们提出了噪声响应模型。受平滑分析和经验观察的启发，噪声响应在主体响应中加入了缺陷，这减轻了标准微地基的局限性。我们的模型保留了分析的可处理性，导致了关于稳定点的更强有力的见解，并在最优条件下施加了较低的社会负担。摘要：When reasoning about strategic behavior in a machine learning context it is tempting to combine standard microfoundations of rational agents with the statistical decision theory underlying classification. In this work, we argue that a direct combination of these standard ingredients leads to brittle solution concepts of limited descriptive and prescriptive value. First, we show that rational agents with perfect information produce discontinuities in the aggregate response to a decision rule that we often do not observe empirically. Second, when any positive fraction of agents is not perfectly strategic, desirable stable points -- where the classifier is optimal for the data it entails -- cease to exist. Third, optimal decision rules under standard microfoundations maximize a measure of negative externality known as social burden within a broad class of possible assumptions about agent behavior. Recognizing these limitations we explore alternatives to standard microfoundations for binary classification. We start by describing a set of desiderata that help navigate the space of possible assumptions about how agents respond to a decision rule. In particular, we analyze a natural constraint on feature manipulations, and discuss properties that are sufficient to guarantee the robust existence of stable points. Building on these insights, we then propose the noisy response model. Inspired by smoothed analysis and empirical observations, noisy response incorporates imperfection in the agent responses, which we show mitigates the limitations of standard microfoundations. Our model retains analytical tractability, leads to more robust insights about stable points, and imposes a lower social burden at optimality.

【5】 Handwritten Digit Recognition using Machine and Deep Learning Algorithms 标题：基于机器和深度学习算法的手写体数字识别

作者：Samay Pashine,Ritik Dixit,Rishika Kushwah 机构：Computer Science and Engineering, Acropolis Institute of Technology & Research, Indore, India 备注：None 链接：https://arxiv.org/abs/2106.12614 摘要：人类对机器的依赖程度从未如此之高，以至于从照片中的物体分类到无声电影中添加声音，一切都可以借助深度学习和机器学习算法来完成。同样，手写文本识别是一个重要的研究和开发领域，有许多可能实现。手写识别（HWR），也称为手写文本识别（HTR），是计算机接收和解释来自纸张文档、照片、触摸屏和其他设备的可理解手写输入的能力[1]。显然，在本文中，我们使用支持向量机（SVM）、多层感知器（MLP）和卷积神经网络（CNN）模型，利用MNIST数据集进行手写体数字识别。我们的主要目标是比较上述模型的准确性以及它们的执行时间，以获得最佳的数字识别模型。摘要：The reliance of humans over machines has never been so high such that from object classification in photographs to adding sound to silent movies everything can be performed with the help of deep learning and machine learning algorithms. Likewise, Handwritten text recognition is one of the significant areas of research and development with a streaming number of possibilities that could be attained. Handwriting recognition (HWR), also known as Handwritten Text Recognition (HTR), is the ability of a computer to receive and interpret intelligible handwritten input from sources such as paper documents, photographs, touch-screens and other devices [1]. Apparently, in this paper, we have performed handwritten digit recognition with the help of MNIST datasets using Support Vector Machines (SVM), Multi-Layer Perceptron (MLP) and Convolution Neural Network (CNN) models. Our main objective is to compare the accuracy of the models stated above along with their execution time to get the best possible model for digit recognition.

【6】 Label Disentanglement in Partition-based Extreme Multilabel Classification 标题：基于划分的极值多标签分类中的标签解缠

作者：Xuanqing Liu,Wei-Cheng Chang,Hsiang-Fu Yu,Cho-Jui Hsieh,Inderjit S. Dhillon 机构：UCLA, Amazon, UT Austin 链接：https://arxiv.org/abs/2106.12751 摘要：基于划分的方法由于其可扩展到较大的输出空间（例如，数百万或更多）而越来越多地用于极端多标签分类（XMC）问题。然而，现有的方法将大的标签空间划分为相互排斥的簇，当标签具有多模态和丰富语义时，这种方法是次优的。例如，标签“Apple”可以是水果，也可以是商标名，这就引出了以下研究问题：我们能否将这些多模式标签与为下游XMC任务定制的非独占聚类分离开来？在本文中，我们证明了基于分区的XMC中的标签分配问题可以表示为一个优化问题，目标是最大化准确率。这就产生了一种高效的算法来形成灵活的重叠标签簇，以及一种可以交替优化基于分区的XMC的簇分配和模型参数的方法。在合成数据集和真实数据集上的实验结果表明，我们的方法可以成功地分离多模态标签，从而在四个XMC基准上得到最新的结果。摘要：Partition-based methods are increasingly-used in extreme multi-label classification (XMC) problems due to their scalability to large output spaces (e.g., millions or more). However, existing methods partition the large label space into mutually exclusive clusters, which is sub-optimal when labels have multi-modality and rich semantics. For instance, the label "Apple" can be the fruit or the brand name, which leads to the following research question: can we disentangle these multi-modal labels with non-exclusive clustering tailored for downstream XMC tasks? In this paper, we show that the label assignment problem in partition-based XMC can be formulated as an optimization problem, with the objective of maximizing precision rates. This leads to an efficient algorithm to form flexible and overlapped label clusters, and a method that can alternatively optimizes the cluster assignments and the model parameters for partition-based XMC. Experimental results on synthetic and real datasets show that our method can successfully disentangle multi-modal labels, leading to state-of-the-art (SOTA) results on four XMC benchmarks.

表征(2篇)

【1】 Fairness via Representation Neutralization 标题：通过表征中和实现公平

作者：Mengnan Du,Subhabrata Mukherjee,Guanchu Wang,Ruixiang Tang,Ahmed Hassan Awadallah,Xia Hu 机构：Texas A&M University,Microsoft Research 链接：https://arxiv.org/abs/2106.12674 摘要：现有的DNN模型的偏差缓解方法主要是学习基于Debias的编码器。这个过程不仅需要对敏感属性进行大量实例级注释，而且不能保证所有公平性敏感信息都已从编码器中删除。为了解决这些局限性，我们探讨了以下研究问题：即使以有偏表示作为输入，我们是否可以通过仅对分类头进行减分来降低DNN模型的区分度？为此，我们提出了一种新的缓解技术，即表示中和公平性（RNF），该技术仅通过去除DNN模型中特定于任务的分类头来实现公平性。为此，我们利用具有相同地面真值标签但敏感属性不同的样本，并利用它们的中和表示来训练DNN模型的分类头。RNF的关键思想是阻止分类头捕获具有特定类标签的编码器表示中公平敏感信息之间的虚假相关性。为了解决低资源设置而无法访问敏感属性注释的问题，我们利用偏差放大模型为敏感属性生成代理注释。在多个基准数据集上的实验结果表明，我们的RNF框架可以有效地降低DNN模型的区分度，并且任务特定性能的退化最小。摘要：Existing bias mitigation methods for DNN models primarily work on learning debiased encoders. This process not only requires a lot of instance-level annotations for sensitive attributes, it also does not guarantee that all fairness sensitive information has been removed from the encoder. To address these limitations, we explore the following research question: Can we reduce the discrimination of DNN models by only debiasing the classification head, even with biased representations as inputs? To this end, we propose a new mitigation technique, namely, Representation Neutralization for Fairness (RNF) that achieves fairness by debiasing only the task-specific classification head of DNN models. To this end, we leverage samples with the same ground-truth label but different sensitive attributes, and use their neutralized representations to train the classification head of the DNN model. The key idea of RNF is to discourage the classification head from capturing spurious correlation between fairness sensitive information in encoder representations with specific class labels. To address low-resource settings with no access to sensitive attribute annotations, we leverage a bias-amplified model to generate proxy annotations for sensitive attributes. Experimental results over several benchmark datasets demonstrate our RNF framework to effectively reduce discrimination of DNN models with minimal degradation in task-specific performance.

【2】 Leveraging semantically similar queries for ranking via combining representations 标题：利用语义相似的查询通过组合表示进行排序

作者：Hayden S. Helm,Marah Abdin,Benjamin D. Pedigo,Shweti Mahajan,Vince Lyzinski,Youngser Park,Amitabh Basu,Piali~Choudhury,Christopher M. White,Weiwei Yang,Carey E. Priebe 机构： 1 Microsoft Research 2 Johns Hopkins University 3 University of Maryland 链接：https://arxiv.org/abs/2106.12621 摘要：在现代排序问题中，要排序的项的不同和完全不同的表示常常是可用的。因此，明智的做法是尝试将这些表示结合起来以提高排名。实际上，通过组合表示来学习排序对于学习特定查询的排序函数是有原则和实用的。然而，在数据极度匮乏的情况下，特定查询可用的标记数据量可能会导致高度可变和无效的排序函数。减轻少量数据影响的一种方法是利用语义相似查询中的信息。事实上，正如我们在模拟设置和实际数据示例中所演示的，当语义相似的查询可用时，在对特定查询进行排序时，可以使用它们。我们在偏倚-方差权衡的背景下描述和探索这一现象，并将其应用于Bing导航图和果蝇幼虫连接体的数据稀缺设置。摘要：In modern ranking problems, different and disparate representations of the items to be ranked are often available. It is sensible, then, to try to combine these representations to improve ranking. Indeed, learning to rank via combining representations is both principled and practical for learning a ranking function for a particular query. In extremely data-scarce settings, however, the amount of labeled data available for a particular query can lead to a highly variable and ineffective ranking function. One way to mitigate the effect of the small amount of data is to leverage information from semantically similar queries. Indeed, as we demonstrate in simulation settings and real data examples, when semantically similar queries are available it is possible to gainfully use them when ranking with respect to a particular query. We describe and explore this phenomenon in the context of the bias-variance trade off and apply it to the data-scarce settings of a Bing navigational graph and the Drosophila larva connectome.

编码器(1篇)

【1】 Encoding Involutory Invariance in Neural Networks 标题：神经网络中对合不变性的编码

作者：Anwesh Bhattacharya,Marios Mattheakis,Pavlos Protopapas 机构：Department of Physics, Deparment of CS&IS, Birla Institute of Technology & Science Pilani, Pilani, Rajasthan, India - , John A. Paulson School of Engineering and Applied Sciences, Harvard University, Cambridge, Massachusetts , United States 备注：19 pages, 12 figures 链接：https://arxiv.org/abs/2106.12891 摘要：在某些情况下，神经网络（NN）的训练数据遵循基本的物理对称性。然而，除非嵌入到网络结构中，否则不能保证NNs服从潜在的对称性。在这项工作中，我们探索了一种特殊的对称性，其中函数对于奇偶校验$p=\pm1$的对合线性/仿射变换是不变的。我们发展数学定理，并提出神经网络架构，以确保不变性和普遍逼近性质。数值实验表明，在考虑外加对称性的情况下，该模型的性能优于基线网络。对于具有固有水平/垂直反射对称性的数据集，我们还提出了一种适用于卷积神经网络分类任务的方法。摘要：In certain situations, Neural Networks (NN) are trained upon data that obey underlying physical symmetries. However, it is not guaranteed that NNs will obey the underlying symmetry unless embedded in the network structure. In this work, we explore a special kind of symmetry where functions are invariant with respect to involutory linear/affine transformations up to parity $p=\pm 1$. We develop mathematical theorems and propose NN architectures that ensure invariance and universal approximation properties. Numerical experiments indicate that the proposed models outperform baseline networks while respecting the imposed symmetry. An adaption of our technique to convolutional NN classification tasks for datasets with inherent horizontal/vertical reflection symmetry has also been proposed.

优化|敛散性(4篇)

【1】 Bayesian Optimization with High-Dimensional Outputs 标题：高维输出的贝叶斯优化

作者：Wesley J. Maddox,Maximilian Balandat,Andrew Gordon Wilson,Eytan Bakshy 机构：New York University, Facebook 链接：https://arxiv.org/abs/2106.12997 摘要：贝叶斯优化是一种样本有效的黑盒优化过程，通常适用于具有少量独立目标的问题。然而，在实践中，我们经常希望优化在许多相关结果（或“任务”）上定义的目标。例如，科学家们可能希望优化密集网格中的基站网络覆盖范围。类似地，工程师们可能会寻求通过约束或鲁棒优化来平衡机器人在数十种不同环境中的性能。然而，高斯过程（GP）模型通常被用作多任务贝叶斯优化的概率替代模型，其结果数量的伸缩性较差，极大地限制了其适用性。我们设计了一种有效的精确多任务GP抽样技术，将协方差矩阵中的Kronecker结构与Matheron的恒等式相结合，使得我们能够使用具有成千上万个相关输出的精确多任务GP模型进行贝叶斯优化。通过这样做，与现有方法相比，我们在样本效率方面取得了实质性的改进，这些方法只对结果的聚合函数进行建模。我们演示了如何在科学和工程领域的一系列任务中打开贝叶斯优化的一类新应用，包括优化具有65000多个输出的光学干涉仪的干涉图。摘要：Bayesian Optimization is a sample-efficient black-box optimization procedure that is typically applied to problems with a small number of independent objectives. However, in practice we often wish to optimize objectives defined over many correlated outcomes (or ``tasks"). For example, scientists may want to optimize the coverage of a cell tower network across a dense grid of locations. Similarly, engineers may seek to balance the performance of a robot across dozens of different environments via constrained or robust optimization. However, the Gaussian Process (GP) models typically used as probabilistic surrogates for multi-task Bayesian Optimization scale poorly with the number of outcomes, greatly limiting applicability. We devise an efficient technique for exact multi-task GP sampling that combines exploiting Kronecker structure in the covariance matrices with Matheron's identity, allowing us to perform Bayesian Optimization using exact multi-task GP models with tens of thousands of correlated outputs. In doing so, we achieve substantial improvements in sample efficiency compared to existing approaches that only model aggregate functions of the outcomes. We demonstrate how this unlocks a new class of applications for Bayesian Optimization across a range of tasks in science and engineering, including optimizing interference patterns of an optical interferometer with more than 65,000 outputs.

【2】 Learning Multiple Stock Trading Patterns with Temporal Routing Adaptor and Optimal Transport 标题：利用时态路由适配器和最优传输学习多种股票交易模式

作者：Hengxu Lin,Dong Zhou,Weiqing Liu,Jiang Bian 机构：Sun Yat-sen University, Guangdong, China, Microsoft Research, Beijing, China 备注：Accepted by KDD 2021 (research track) 链接：https://arxiv.org/abs/2106.12950 摘要：成功的量化投资通常依赖于对未来股价走势的精确预测。近年来，基于机器学习的解决方案已显示出其更精确的股票预测能力，并成为现代定量投资系统中不可或缺的组成部分。然而，现有方法背后的i.i.d.假设与股市中存在的多种交易模式不一致，这必然限制了它们获得更好的股票预测性能的能力。在本文中，我们提出了一个新的架构，时间路由适配器（TRA），使现有的股票预测模型能够模拟多种股票交易模式。本质上，TRA是一个轻量级模块，它由一组独立的预测器组成，用于学习多个模式，以及一个路由器，用于向不同的预测器发送样本。然而，由于缺乏明确的模式标识符，因此训练一个有效的基于TRA的模型是相当困难的。为了解决这一问题，我们进一步设计了一种基于最优传输（OT）的学习算法来获得最优的样本到预测器分配，并通过一个辅助损失项来有效地优化具有这种分配的路由器。在真实股票排序任务上的实验表明，与最新的基线相比，该方法可以将信息系数（IC）分别从0.053提高到0.059和0.051提高到0.056。我们在这项工作中使用的数据集和代码是公开的：https://github.com/microsoft/qlib. 摘要：Successful quantitative investment usually relies on precise predictions of the future movement of the stock price. Recently, machine learning based solutions have shown their capacity to give more accurate stock prediction and become indispensable components in modern quantitative investment systems. However, the i.i.d. assumption behind existing methods is inconsistent with the existence of diverse trading patterns in the stock market, which inevitably limits their ability to achieve better stock prediction performance. In this paper, we propose a novel architecture, Temporal Routing Adaptor (TRA), to empower existing stock prediction models with the ability to model multiple stock trading patterns. Essentially, TRA is a lightweight module that consists of a set of independent predictors for learning multiple patterns as well as a router to dispatch samples to different predictors. Nevertheless, the lack of explicit pattern identifiers makes it quite challenging to train an effective TRA-based model. To tackle this challenge, we further design a learning algorithm based on Optimal Transport (OT) to obtain the optimal sample to predictor assignment and effectively optimize the router with such assignment through an auxiliary loss term. Experiments on the real-world stock ranking task show that compared to the state-of-the-art baselines, e.g., Attention LSTM and Transformer, the proposed method can improve information coefficient (IC) from 0.053 to 0.059 and 0.051 to 0.056 respectively. Our dataset and code used in this work are publicly available: https://github.com/microsoft/qlib.

【3】 Exploration-Exploitation in Multi-Agent Competition: Convergence with Bounded Rationality 标题：多智能体竞争中的探索开发：有限理性收敛

作者：Stefanos Leonardos,Georgios Piliouras,Kelly Spendlove 机构： and Kelly Spendlove 2 1Singapore University of Technology and Design, sg 2Mathematical Institute, University of Oxford 链接：https://arxiv.org/abs/2106.12928 摘要：在竞争性多智能体学习中，探索和开发之间的相互作用还远未被很好地理解。基于此，我们研究了光滑Q-学习，这是一个典型的学习模型，它明确地抓住了游戏报酬和探索成本之间的平衡。我们证明了在具有正探索率的异质学习主体的加权零和多矩阵对策中，Q-学习总是收敛到唯一的量子反应平衡（QRE），这是有限理性下对策的标准解概念。补充了最近关于加权势对策收敛性的研究结果，我们证明了在竞争环境下，Q-学习的快速收敛性是在不考虑代理数的情况下获得的，并且不需要任何参数微调。我们在网络零和博弈中的实验结果表明，这些理论结果为解决多智能体竞争环境下的均衡选择问题提供了必要的算法保证。摘要：The interplay between exploration and exploitation in competitive multi-agent learning is still far from being well understood. Motivated by this, we study smooth Q-learning, a prototypical learning model that explicitly captures the balance between game rewards and exploration costs. We show that Q-learning always converges to the unique quantal-response equilibrium (QRE), the standard solution concept for games under bounded rationality, in weighted zero-sum polymatrix games with heterogeneous learning agents using positive exploration rates. Complementing recent results about convergence in weighted potential games, we show that fast convergence of Q-learning in competitive settings is obtained regardless of the number of agents and without any need for parameter fine-tuning. As showcased by our experiments in network zero-sum games, these theoretical results provide the necessary guarantees for an algorithmic approach to the currently open problem of equilibrium selection in competitive multi-agent settings.

【4】 A Near-Optimal Algorithm for Debiasing Trained Machine Learning Models 标题：一种用于训练机器学习模型去偏的近似最优算法

作者：Ibrahim Alabdulmohsin,Mario Lucic 机构：Google Research, Brain Team, Zürich, Switzerland 备注：21 pages, 5 figures 链接：https://arxiv.org/abs/2106.12887 摘要：我们提出了一个可扩展的后处理算法，用于去除训练模型，包括深度神经网络（DNNs），我们通过限制其过度Bayes风险证明了它是接近最优的。我们在经典算法和现代DNN体系结构的标准基准数据集上验证了它的优势，并证明了它在处理性能上优于以前的后处理方法。此外，我们还证明了所提出的算法对于大规模训练的模型是特别有效的，其中后处理是一种自然而实用的选择。摘要：We present a scalable post-processing algorithm for debiasing trained models, including deep neural networks (DNNs), which we prove to be near-optimal by bounding its excess Bayes risk. We empirically validate its advantages on standard benchmark datasets across both classical algorithms as well as modern DNN architectures and demonstrate that it outperforms previous post-processing methods while performing on par with in-processing. In addition, we show that the proposed algorithm is particularly effective for models trained at scale where post-processing is a natural and practical choice.

预测|估计(6篇)

【1】 FitVid: Overfitting in Pixel-Level Video Prediction 标题：FitVid：像素级视频预测中的过拟合

作者：Mohammad Babaeizadeh,Mohammad Taghi Saffar,Suraj Nair,Sergey Levine,Chelsea Finn,Dumitru Erhan 机构：Google Brain, Stanford University 链接：https://arxiv.org/abs/2106.13195 摘要：一个能够预测接下来会发生什么的代理可以通过计划执行各种任务，而无需额外的训练。此外，这样的代理可以在内部表示真实世界的复杂动态，因此可以获得对各种视觉感知任务有用的表示。这使得预测视频的未来帧，以观察到的过去和潜在的未来行动为条件，成为一项有趣的任务，尽管最近取得了许多进展，但仍然具有特别的挑战性。现有的视频预测模型在简单的狭义基准上取得了很好的效果，但在具有更复杂动力学或更广泛领域的真实数据集上，它们产生的预测质量较低。越来越多的证据表明，对训练数据的拟合不足是导致低质量预测的主要原因之一。在本文中，我们认为，在当前的视频模型中，参数使用效率低下是导致拟合不足的主要原因。因此，我们引入了一个新的体系结构FitVid，它能够在现有的最先进的模型中具有相似的参数计数的同时，对常用的基准进行严重的过度拟合。我们分析了过度拟合的后果，说明了过度拟合如何产生意想不到的结果，例如通过重复训练数据生成高质量的输出，以及如何使用现有的图像增强技术来缓解过度拟合。因此，FitVid在四种不同的视频预测基准上，在四种不同的度量标准上都优于当前最先进的模型。摘要：An agent that is capable of predicting what happens next can perform a variety of tasks through planning with no additional training. Furthermore, such an agent can internally represent the complex dynamics of the real-world and therefore can acquire a representation useful for a variety of visual perception tasks. This makes predicting the future frames of a video, conditioned on the observed past and potentially future actions, an interesting task which remains exceptionally challenging despite many recent advances. Existing video prediction models have shown promising results on simple narrow benchmarks but they generate low quality predictions on real-life datasets with more complicated dynamics or broader domain. There is a growing body of evidence that underfitting on the training data is one of the primary causes for the low quality predictions. In this paper, we argue that the inefficient use of parameters in the current video models is the main reason for underfitting. Therefore, we introduce a new architecture, named FitVid, which is capable of severe overfitting on the common benchmarks while having similar parameter count as the current state-of-the-art models. We analyze the consequences of overfitting, illustrating how it can produce unexpected outcomes such as generating high quality output by repeating the training data, and how it can be mitigated using existing image augmentation techniques. As a result, FitVid outperforms the current state-of-the-art models across four different video prediction benchmarks on four different metrics.

【2】 Automated Agriculture Commodity Price Prediction System with Machine Learning Techniques 标题：基于机器学习技术的农业商品价格自动预测系统

作者：Zhiyuan Chen,Howe Seng Goh,Kai Ling Sin,Kelly Lim,Nicole Ka Hei Chung,Xin Yu Liew 机构：School of Computer Science, University of Nottingham Malaysia, Semenyih, Malaysia, A R T I C L E I N F O, A B S T R A C T, Article history:, Received:, Accepted:, Online: 备注：This paper has been submitted to Advances in Science, Technology and Engineering Systems Journal 链接：https://arxiv.org/abs/2106.12747 摘要：本研究的目的是研究并设计一个基于机器学习技术的农产品价格自动预测系统。由于农产品价格的大量历史数据的不断增加和对价格波动进行准确预测的需要，解决问题的方法已从统计方法转向机器学习领域。然而，从历史数据中选择合适的集合进行预测的考虑仍然有限。另一方面，在实施机器学习技术时，寻找一个具有全局最优参数、非线性和避免维数灾难的模型仍然是最大的挑战，因此需要研究机器学习策略。在本研究中，我们提出一个基于网路的农业商品价格自动预测系统。在两个系列的实验中，将ARIMA、SVR、Prophet、XGBoost和LSTM五种流行的机器学习算法与马来西亚的大型历史数据集进行了比较，并选择平均均方误差为0.304的LSTM模型作为该系统的预测引擎。摘要：The intention of this research is to study and design an automated agriculture commodity price prediction system with novel machine learning techniques. Due to the increasing large amounts historical data of agricultural commodity prices and the need of performing accurate prediction of price fluctuations, the solution has largely shifted from statistical methods to machine learning area. However, the selection of proper set from historical data for forecasting still has limited consideration. On the other hand, when implementing machine learning techniques, finding a suitable model with optimal parameters for global solution, nonlinearity and avoiding curse of dimensionality are still biggest challenges, therefore machine learning strategies study are needed. In this research, we propose a web-based automated system to predict agriculture commodity price. In the two series experiments, five popular machine learning algorithms, ARIMA, SVR, Prophet, XGBoost and LSTM have been compared with large historical datasets in Malaysia and the most optimal algorithm, LSTM model with an average of 0.304 mean-square error has been selected as the prediction engine of the proposed system.

【3】 On the relationship between predictive coding and backpropagation 标题：关于预测编码与反向传播的关系

作者：Robert Rosenbaum 机构：Department of Applied and Computational Mathematics and Statistics, University of Notre Dame, Notre Dame, IN , USA 链接：https://arxiv.org/abs/2106.13082 摘要：在这篇手稿中，我回顾并扩展了最近在预测编码和反向传播之间的关系方面的工作，以便在监督学习任务中训练人工神经网络。我还讨论了这些结果对于将预测编码和深度神经网络解释为生物学习模型的一些含义，并描述了一个函数库Torch2PC，用于使用PyTorch神经网络模型执行预测编码。摘要：In this manuscript, I review and extend recent work on the relationship between predictive coding and backpropagation for training artificial neural networks on supervised learning tasks. I also discuss some implications of these results for the interpretation of predictive coding and deep neural networks as models of biological learning and I describe a repository of functions, Torch2PC, for performing predictive coding with PyTorch neural network models.

【4】 Next-Day Bitcoin Price Forecast Based on Artificial intelligence Methods 标题：基于人工智能方法的次日比特币价格预测

作者：Liping Yang 机构：School of Economics and Management, Beijing University of Chemical Technology 链接：https://arxiv.org/abs/2106.12961 摘要：近年来，比特币价格预测引起了研究者和投资者的兴趣。然而，以往研究的准确性还不够好。机器学习和深度学习方法在这方面具有很强的预测能力。本文提出了一种结合集成经验模式分解（EEMD）和长短时记忆（LSTM）的深度学习方法来研究次日比特币价格预测问题。摘要：In recent years, Bitcoin price prediction has attracted the interest of researchers and investors. However, the accuracy of previous studies is not well enough. Machine learning and deep learning methods have been proved to have strong prediction ability in this area. This paper proposed a method combined with Ensemble Empirical Mode Decomposition (EEMD) and a deep learning method called long short-term memory (LSTM) to research the problem of next-day Bitcoin price forecast.

【5】 Machine learning structure preserving brackets for forecasting irreversible processes 标题：用于预测不可逆过程的机器学习结构保持括号

作者：Kookjin Lee,Nathaniel A. Trask,Panos Stinis 机构：School of Computing, Informatics, and Decision Systems Engineering, Arizona State University, Tempe, AZ , Center for Computing Research, Sandia National Laboratories, Albuquerque, NM , Pacific Northwest National Laboratory, Richland, WA 链接：https://arxiv.org/abs/2106.12619 摘要：时间序列数据的预测需要施加诱导偏差以获得预测外推，最近的工作已经施加哈密顿量/拉格朗日形式以保持可逆动力学系统的结构。在这项工作中，我们提出了一个新的参数化耗散括号从metriclectic动力系统适合学习不可逆动力学与未知的先验模型形式。该过程学习广义Casimirs，分别保证能量守恒和熵不变。此外，对于附加热噪声的情况，我们保证了涨落耗散定理的精确保持，保证了热力学的一致性。我们为耗散系统提供了基准，证明学习动力学比“黑箱”或基于惩罚的方法更具鲁棒性和通用性。摘要：Forecasting of time-series data requires imposition of inductive biases to obtain predictive extrapolation, and recent works have imposed Hamiltonian/Lagrangian form to preserve structure for systems with reversible dynamics. In this work we present a novel parameterization of dissipative brackets from metriplectic dynamical systems appropriate for learning irreversible dynamics with unknown a priori model form. The process learns generalized Casimirs for energy and entropy guaranteed to be conserved and nondecreasing, respectively. Furthermore, for the case of added thermal noise, we guarantee exact preservation of a fluctuation-dissipation theorem, ensuring thermodynamic consistency. We provide benchmarks for dissipative systems demonstrating learned dynamics are more robust and generalize better than either "black-box" or penalty-based approaches.

【6】 Real-time gravitational-wave science with neural posterior estimation 标题：神经后验估计的实时引力波科学

作者：Maximilian Dax,Stephen R. Green,Jonathan Gair,Jakob H. Macke,Alessandra Buonanno,Bernhard Schölkopf 机构：Max Planck Institute for Intelligent Systems, Max-Planck-Ring , T¨ubingen, Germany, Max Planck Institute for Gravitational Physics (Albert Einstein Institute), Am M¨uhlenberg , Potsdam, Germany 备注：7+12 pages, 4+11 figures 链接：https://arxiv.org/abs/2106.12594 摘要：我们展示了前所未有的精度快速重力波参数估计与深度学习。使用神经网络作为贝叶斯后验分布的替代，我们分析了第一个LIGO-Virgo引力波瞬态目录中的8个引力波事件，发现与标准推理代码非常接近，但每个事件的推理时间从0（天）减少到1分钟。我们的网络使用模拟数据进行训练，包括对事件附近探测器噪声特性的估计。这种方法可以在数百万个神经网络参数内对信号和噪声模型进行编码，并且可以对任何与训练分布一致的观测数据进行推断，从而解释事件之间的噪声非平稳性。我们的算法——被称为“DINGO”——在快速准确地推断探测到的重力波事件的物理参数方面建立了一个新的标准，它应该能够在不牺牲精度的情况下进行实时数据分析。摘要：We demonstrate unprecedented accuracy for rapid gravitational-wave parameter estimation with deep learning. Using neural networks as surrogates for Bayesian posterior distributions, we analyze eight gravitational-wave events from the first LIGO-Virgo Gravitational-Wave Transient Catalog and find very close quantitative agreement with standard inference codes, but with inference times reduced from O(day) to a minute per event. Our networks are trained using simulated data, including an estimate of the detector-noise characteristics near the event. This encodes the signal and noise models within millions of neural-network parameters, and enables inference for any observed data consistent with the training distribution, accounting for noise nonstationarity from event to event. Our algorithm -- called "DINGO" -- sets a new standard in fast-and-accurate inference of physical parameters of detected gravitational-wave events, which should enable real-time data analysis without sacrificing accuracy.

其他神经网络|深度学习|模型|建模(24篇)

【1】 Learning Language and Multimodal Privacy-Preserving Markers of Mood from Mobile Data 标题：移动数据中的语言学习与多模态隐私保护情绪标记物

作者：Paul Pu Liang,Terrance Liu,Anna Cai,Michal Muszynski,Ryo Ishii,Nicholas Allen,Randy Auerbach,David Brent,Ruslan Salakhutdinov,Louis-Philippe Morency 机构：Carnegie Mellon University ,University of Oregon, Columbia University ,University of Pittsburgh 备注：ACL 2021. arXiv admin note: substantial text overlap with arXiv:2012.02359 链接：https://arxiv.org/abs/2106.13213 摘要：即使在普遍享有先进医疗服务的国家，精神健康状况仍然诊断不足。从易于收集的数据中准确有效地预测情绪的能力对于心理健康障碍的早期发现、干预和治疗有着重要的意义。一个有希望帮助监测人类行为的数据源是智能手机的日常使用。但是，必须注意总结行为，不要通过个人（例如，个人可识别信息）或受保护（例如，种族、性别）属性识别用户。在这篇论文中，我们研究了日常情绪的行为标记，使用了一个最新的数据集移动行为的青少年人群中的高风险自杀行为。使用计算模型，我们发现语言和移动类型文本的多模态表示（跨类型字符、单词、按键时间和应用程序使用）可以预测日常情绪。然而，我们发现训练用来预测情绪的模型通常也会在其中间表示中捕获私人用户身份。为了解决这个问题，我们评估了在保持预测性的同时混淆用户身份的方法。通过将多模态表示与隐私保护学习相结合，我们能够推进性能隐私前沿。摘要：Mental health conditions remain underdiagnosed even in countries with common access to advanced medical care. The ability to accurately and efficiently predict mood from easily collectible data has several important implications for the early detection, intervention, and treatment of mental health disorders. One promising data source to help monitor human behavior is daily smartphone usage. However, care must be taken to summarize behaviors without identifying the user through personal (e.g., personally identifiable information) or protected (e.g., race, gender) attributes. In this paper, we study behavioral markers of daily mood using a recent dataset of mobile behaviors from adolescent populations at high risk of suicidal behaviors. Using computational models, we find that language and multimodal representations of mobile typed text (spanning typed characters, words, keystroke timings, and app usage) are predictive of daily mood. However, we find that models trained to predict mood often also capture private user identities in their intermediate representations. To tackle this problem, we evaluate approaches that obfuscate user identity while remaining predictive. By combining multimodal representations with privacy-preserving learning, we are able to push forward the performance-privacy frontier.

【2】 Towards Fully Interpretable Deep Neural Networks: Are We There Yet? 标题：走向完全可解释的深度神经网络：我们到了吗？

作者：Sandareka Wickramanayake,Wynne Hsu,Mong Li Lee 机构：Research on opening black-box DNNs can be broadly cate-gorized into post-hoc methods and inherently interpretable 1School of Computing, National University of Singa-pore 2Institute of Data Science, National University of Singa-pore 备注：Presented at the ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.13164 摘要：在人工智能（AI）系统中，尽管深度神经网络（DNNs）表现出了显著的性能，但它仍然是阻碍用户信任的黑匣子。对开放黑盒DNN的研究大致可分为事后方法和内在可解释DNN。虽然已经对事后解释方法进行了许多调查，但很少有人致力于内在可解释的DNN。本文综述了发展具有内在解释性的神经网络的现有方法，重点介绍了卷积神经网络（CNNs）。目的是了解目前朝着完全可解释的、能够满足不同解释要求的DNN方面取得的进展。最后，我们找出目前工作中的差距，并提出潜在的研究方向。摘要：Despite the remarkable performance, Deep Neural Networks (DNNs) behave as black-boxes hindering user trust in Artificial Intelligence (AI) systems. Research on opening black-box DNN can be broadly categorized into post-hoc methods and inherently interpretable DNNs. While many surveys have been conducted on post-hoc interpretation methods, little effort is devoted to inherently interpretable DNNs. This paper provides a review of existing methods to develop DNNs with intrinsic interpretability, with a focus on Convolutional Neural Networks (CNNs). The aim is to understand the current progress towards fully interpretable DNNs that can cater to different interpretation requirements. Finally, we identify gaps in current work and suggest potential research directions.

【3】 Fold2Seq: A Joint Sequence(1D)-Fold(3D) Embedding-based Generative Model for Protein Design 标题：Fold2Seq：一种基于联合序列(1D)-折叠(3D)嵌入的蛋白质设计产生式模型

作者：Yue Cao,Payel Das,Vijil Chenthamarakshan,Pin-Yu Chen,Igor Melnyk,Yang Shen 机构： 1IBM Research 2Texas A&M University 备注：ICML 2021 链接：https://arxiv.org/abs/2106.13058 摘要：在蛋白质工程中，设计新的蛋白质序列以获得理想的三维拓扑折叠是一项重要的任务。由于复杂的序列-褶皱关系，以及难以捕捉褶皱内序列的多样性（因此结构和功能），因此存在挑战。为了克服这些挑战，我们提出了Fold2Seq，一种新的基于Transformer的生成框架，用于设计以特定目标折叠为条件的蛋白质序列。为了模拟复杂的序列-结构关系，Fold2Seq联合学习了一个使用Transformer的序列嵌入和一个从三维体素的二级结构元素的密度折叠嵌入。在单个褶皱的单一、高分辨率和完整结构输入的测试集上，我们的实验证明了Fold2Seq在速度、覆盖率和序列设计可靠性方面的改进或可比性能，与现有的最先进的方法相比，这些方法包括数据驱动的深层生成模型和基于物理的设计。与基于结构的深层模型和RosettaDesign相比，基于fold的Fold2Seq的独特优势在另外三个源于低质量、不完整或模棱两可的输入结构的现实世界挑战上变得更加明显。源代码和数据可在https://github.com/IBM/fold2seq. 摘要：Designing novel protein sequences for a desired 3D topological fold is a fundamental yet non-trivial task in protein engineering. Challenges exist due to the complex sequence--fold relationship, as well as the difficulties to capture the diversity of the sequences (therefore structures and functions) within a fold. To overcome these challenges, we propose Fold2Seq, a novel transformer-based generative framework for designing protein sequences conditioned on a specific target fold. To model the complex sequence--structure relationship, Fold2Seq jointly learns a sequence embedding using a transformer and a fold embedding from the density of secondary structural elements in 3D voxels. On test sets with single, high-resolution and complete structure inputs for individual folds, our experiments demonstrate improved or comparable performance of Fold2Seq in terms of speed, coverage, and reliability for sequence design, when compared to existing state-of-the-art methods that include data-driven deep generative models and physics-based RosettaDesign. The unique advantages of fold-based Fold2Seq, in comparison to a structure-based deep model and RosettaDesign, become more evident on three additional real-world challenges originating from low-quality, incomplete, or ambiguous input structures. Source code and data are available at https://github.com/IBM/fold2seq.

【4】 Towards Biologically Plausible Convolutional Networks 标题：走向生物学上看似合理的卷积网络

作者：Roman Pogodin,Yash Mehta,Timothy P. Lillicrap,Peter E. Latham 机构：Gatsby Unit, UCL, DeepMind; CoMPLEX, UCL 链接：https://arxiv.org/abs/2106.13031 摘要：卷积网络在深度学习中无处不在。它们对于图像特别有用，因为它们减少了参数的数量，减少了训练时间，提高了准确性。然而，作为大脑的一个模型，它们存在严重的问题，因为它们需要重量分担——这是真正的神经元根本无法做到的。因此，虽然大脑中的神经元可以局部连接（卷积网络的特征之一），但它们不能卷积。然而，本地连接但非卷积网络的性能明显低于卷积网络。这对于使用卷积网络来解释视觉系统活动的研究来说是个麻烦。在这里，我们研究了权值共享的可行替代方案，其目标是相同的正则化原则，即使池中的每个神经元对相同的输入做出类似的反应。最自然的方法是向网络展示同一图像的多种翻译，类似于动物视觉中的扫视。但是，这种方法需要很多翻译，并且不能消除性能差距。相反，我们建议将横向连接添加到本地连接的网络中，并允许通过Hebbian可塑性进行学习。这就需要网络偶尔暂停一下，进入“体重分担”的睡眠阶段。这种方法使得本地连接的网络在ImageNet上获得几乎卷积的性能，从而支持卷积网络作为视觉流的模型。摘要：Convolutional networks are ubiquitous in deep learning. They are particularly useful for images, as they reduce the number of parameters, reduce training time, and increase accuracy. However, as a model of the brain they are seriously problematic, since they require weight sharing - something real neurons simply cannot do. Consequently, while neurons in the brain can be locally connected (one of the features of convolutional networks), they cannot be convolutional. Locally connected but non-convolutional networks, however, significantly underperform convolutional ones. This is troublesome for studies that use convolutional networks to explain activity in the visual system. Here we study plausible alternatives to weight sharing that aim at the same regularization principle, which is to make each neuron within a pool react similarly to identical inputs. The most natural way to do that is by showing the network multiple translations of the same image, akin to saccades in animal vision. However, this approach requires many translations, and doesn't remove the performance gap. We propose instead to add lateral connectivity to a locally connected network, and allow learning via Hebbian plasticity. This requires the network to pause occasionally for a sleep-like phase of "weight sharing". This method enables locally connected networks to achieve nearly convolutional performance on ImageNet, thus supporting convolutional networks as a model of the visual stream.

【5】 Objective discovery of dominant dynamical processes with intelligible machine learning 标题：基于可理解机器学习的主导动态过程的客观发现

作者：Bryan E. Kaiser,Juan A. Saenz,Maike Sonnewald,Daniel Livescu 机构：Livescu, Los Alamos National Laboratory, X-Computational Physics Division XCP-, Princeton University, Program in Atmospheric and Oceanic Sciences,Princeton, NJ , USA, NOAAOAR Geophysical Fluid Dynamics Laboratory, Ocean and Cryosphere Division 备注：21 pages, 7 figures 链接：https://arxiv.org/abs/2106.12963 摘要：大数据的出现为从气候科学到医学等自然现象的发现带来了巨大的潜力，但压倒性的复杂性阻碍了人们的洞察力。现有的理论往往不能简明扼要地描述突出的现象，而进展在很大程度上依赖于动态机制的特殊定义来指导和集中探索。我们提出了一个形式化的定义，其中动态区域的识别被描述为一个优化问题，并提出了一个可理解的目标函数。此外，我们提出了一个无监督学习框架，消除了对先验知识和特殊定义的需要；相反，用户只需要选择合适的聚类和降维算法，并且可以使用我们提出的目标函数来指导这种选择。我们用海洋动力学、肿瘤血管生成和湍流边界层的例子来说明它的适用性。我们的方法是朝着无偏见的数据探索迈出的一步，它允许在动力系统中进行偶然发现，有可能推动物理科学向前发展。摘要：The advent of big data has vast potential for discovery in natural phenomena ranging from climate science to medicine, but overwhelming complexity stymies insight. Existing theory is often not able to succinctly describe salient phenomena, and progress has largely relied on ad hoc definitions of dynamical regimes to guide and focus exploration. We present a formal definition in which the identification of dynamical regimes is formulated as an optimization problem, and we propose an intelligible objective function. Furthermore, we propose an unsupervised learning framework which eliminates the need for a priori knowledge and ad hoc definitions; instead, the user need only choose appropriate clustering and dimensionality reduction algorithms, and this choice can be guided using our proposed objective function. We illustrate its applicability with example problems drawn from ocean dynamics, tumor angiogenesis, and turbulent boundary layers. Our method is a step towards unbiased data exploration that allows serendipitous discovery within dynamical systems, with the potential to propel the physical sciences forward.

【6】 Using machine learning techniques to predict hospital admission at the emergency department 标题：利用机器学习技术预测急诊科入院人数

作者：Georgios Feretzakis,George Karlis,Evangelos Loupelis,Dimitris Kalles,Rea Chatzikyriakou,Nikolaos Trakas,Eugenia Karakou,Aikaterini Sakagianni,Lazaros Tzelves,Stavroula Petropoulou,Aikaterini Tika,Ilias Dalainas,Vasileios Kaldis 机构：a School of Science and Technology, Hellenic Open University, Patras, Greece, b Sismanogleio General Hospital, IT department, Marousi, Greece, c Sismanogleio General Hospital, Department of Quality Control, Research and, Continuing Education, Marousi, Greece 备注：20 pages, 2 figures 链接：https://arxiv.org/abs/2106.12921 摘要：导言：急诊科最重要的任务之一是及时确定哪些病人将从入院中受益。机器学习（ML）技术在医疗保健领域显示出作为诊断辅助手段的前景。材料和方法：我们调查了以下特征，以探讨其在预测入院方面的表现：血清尿素、肌酐、乳酸脱氢酶、肌酸激酶、C-反应蛋白、全血分类计数、活化部分凝血活酶时间、D-二聚体、国际标准化比值，年龄、性别、急诊科的分诊情况和救护车的使用情况。共分析了3204例急诊就诊。结果：提出的算法生成的模型在预测急诊病人入院方面表现出了可接受的性能。8种算法的F-measure和ROC-Area值范围分别为[0.679-0.708]和[0.734-0.774]。讨论：此工具的主要优点包括易于访问、可用性、是/否结果和低成本。我们的方法的临床意义可能有助于从传统的临床决策向更复杂的模式转变。结论：利用常见的生物标志物建立可靠的预后模型是一个可能影响急诊医学未来发展的项目。我们的发现证实了实用主义ED试验的实施。摘要：Introduction: One of the most important tasks in the Emergency Department (ED) is to promptly identify the patients who will benefit from hospital admission. Machine Learning (ML) techniques show promise as diagnostic aids in healthcare. Material and methods: We investigated the following features seeking to investigate their performance in predicting hospital admission: serum levels of Urea, Creatinine, Lactate Dehydrogenase, Creatine Kinase, C-Reactive Protein, Complete Blood Count with differential, Activated Partial Thromboplastin Time, D Dimer, International Normalized Ratio, age, gender, triage disposition to ED unit and ambulance utilization. A total of 3,204 ED visits were analyzed. Results: The proposed algorithms generated models which demonstrated acceptable performance in predicting hospital admission of ED patients. The range of F-measure and ROC Area values of all eight evaluated algorithms were [0.679-0.708] and [0.734-0.774], respectively. Discussion: The main advantages of this tool include easy access, availability, yes/no result, and low cost. The clinical implications of our approach might facilitate a shift from traditional clinical decision-making to a more sophisticated model. Conclusion: Developing robust prognostic models with the utilization of common biomarkers is a project that might shape the future of emergency medicine. Our findings warrant confirmation with implementation in pragmatic ED trials.

【7】 Recurrent Neural Network from Adder's Perspective: Carry-lookahead RNN 标题：加法器视角下的递归神经网络：超前进位RNN

作者：Haowei Jiang,Feiwei Qin,Jin Cao,Yong Peng,Yanli Shao 机构：School of Computer Science and Technology, Hangzhou Dianzi University, China, Whiting School of Engineering, Johns Hopkins University, USA 链接：https://arxiv.org/abs/2106.12901 摘要：递归网络结构是序列建模中广泛使用的一种模型，但它的串行依赖性阻碍了计算的并行化，导致运算效率低下。在数字电子学的早期阶段，串行加法器也遇到了同样的问题。本文讨论了递归神经网络（RNN）与串行加法器的相似性。受进位先行加法器的启发，我们将进位先行模块引入到RNN中，使RNN能够并行运行。在此基础上，设计了并行RNN计算方法，提出了进位先行RNN（CL-RNN）。CL-RNN具有并行性好、感受野灵活等优点。通过一组全面的测试，我们验证了CL-RNN在为RNN设计的序列建模任务中的性能优于现有的典型RNN。摘要：The recurrent network architecture is a widely used model in sequence modeling, but its serial dependency hinders the computation parallelization, which makes the operation inefficient. The same problem was encountered in serial adder at the early stage of digital electronics. In this paper, we discuss the similarities between recurrent neural network (RNN) and serial adder. Inspired by carry-lookahead adder, we introduce carry-lookahead module to RNN, which makes it possible for RNN to run in parallel. Then, we design the method of parallel RNN computation, and finally Carry-lookahead RNN (CL-RNN) is proposed. CL-RNN takes advantages in parallelism and flexible receptive field. Through a comprehensive set of tests, we verify that CL-RNN can perform better than existing typical RNNs in sequence modeling tasks which are specially designed for RNNs.

【8】 A Construction Kit for Efficient Low Power Neural Network Accelerator Designs 标题：一种用于高效低功耗神经网络加速器设计的构造工具包

作者：Petar Jokic,Erfan Azarkhish,Andrea Bonetti,Marc Pons,Stephane Emery,Luca Benini 机构： Zurich, Switzerland and CSEM SA, Zurich, Switzerland (email:, Institute of Technology, ETH Zurich, Zurich, Switzerland and the, further challenges designers to quickly adopt newly introduced, network features, requiring fast development times. While 链接：https://arxiv.org/abs/2106.12810 摘要：在边缘实现嵌入式神经网络处理需要高效的硬件加速，以耦合高计算性能和低功耗。在网络体系结构及其算法特性快速发展的推动下，加速器的设计不断更新和改进。为了评估和比较硬件设计选择，设计者可以参考文献中大量的加速器实现。调查提供了这些工作的概述，但通常限于系统级和特定于基准的性能指标，因此很难定量比较每种使用的优化技术的单独效果。这使得新加速器设计的优化评估复杂化，减缓了研究进展。这项工作提供了一个调查的神经网络加速器优化方法，已在最近的工作中使用，并报告其个别影响边缘处理性能。它将优化列表及其量化效果作为一个构建工具包，允许分别评估每个构建块的设计选择。报告的优化范围从高达10000倍的内存节省到33倍的能源减少，为芯片设计师提供了一个实现高效低功耗神经网络加速器的设计选择概述。摘要：Implementing embedded neural network processing at the edge requires efficient hardware acceleration that couples high computational performance with low power consumption. Driven by the rapid evolution of network architectures and their algorithmic features, accelerator designs are constantly updated and improved. To evaluate and compare hardware design choices, designers can refer to a myriad of accelerator implementations in the literature. Surveys provide an overview of these works but are often limited to system-level and benchmark-specific performance metrics, making it difficult to quantitatively compare the individual effect of each utilized optimization technique. This complicates the evaluation of optimizations for new accelerator designs, slowing-down the research progress. This work provides a survey of neural network accelerator optimization approaches that have been used in recent works and reports their individual effects on edge processing performance. It presents the list of optimizations and their quantitative effects as a construction kit, allowing to assess the design choices for each building block separately. Reported optimizations range from up to 10'000x memory savings to 33x energy reductions, providing chip designers an overview of design choices for implementing efficient low power neural network accelerators.

【9】 Hamiltonian-based Neural ODE Networks on the SE(3) Manifold For Dynamics Learning and Control 标题：SE(3)流形上基于哈密顿的动力学学习与控制神经常微分方程网络

作者：Thai Duong,Nikolay Atanasov 机构：Department of Electrical and Computer Engineering, University of California, San Diego, La Jolla, CA , USA 备注：Accepted to RSS 2021. Website: this https URL 链接：https://arxiv.org/abs/2106.12782 摘要：精确的机器人动力学模型对于机器人的安全稳定控制和新工况的推广至关重要。然而，手工设计的模型可能不够精确，即使经过仔细的参数调整。这促使机器学习技术的使用，以近似的机器人动力学训练集的状态控制轨迹。以SE（3）位姿和广义速度描述了地面、空中和水下机器人的动力学行为，并满足能量守恒原理。本文在常微分方程（ODE）网络结构的SE（3）流形上提出了一种近似刚体动力学的哈密顿公式。与黑匣子ODE网络不同，我们的公式保证了建筑节能。我们开发了能量成形和阻尼注入控制的学习，潜在欠驱动SE（3）哈密顿动力学，使稳定和轨迹跟踪与各种平台，包括摆，刚体和四转子系统的统一方法。摘要：Accurate models of robot dynamics are critical for safe and stable control and generalization to novel operational conditions. Hand-designed models, however, may be insufficiently accurate, even after careful parameter tuning. This motivates the use of machine learning techniques to approximate the robot dynamics over a training set of state-control trajectories. The dynamics of many robots, including ground, aerial, and underwater vehicles, are described in terms of their SE(3) pose and generalized velocity, and satisfy conservation of energy principles. This paper proposes a Hamiltonian formulation over the SE(3) manifold of the structure of a neural ordinary differential equation (ODE) network to approximate the dynamics of a rigid body. In contrast to a black-box ODE network, our formulation guarantees total energy conservation by construction. We develop energy shaping and damping injection control for the learned, potentially under-actuated SE(3) Hamiltonian dynamics to enable a unified approach for stabilization and trajectory tracking with various platforms, including pendulum, rigid-body, and quadrotor systems.

【10】 Task-agnostic Continual Learning with Hybrid Probabilistic Models 标题：基于混合概率模型的任务不可知性持续学习

作者：Polina Kirichenko,Mehrdad Farajtabar,Dushyant Rao,Balaji Lakshminarayanan,Nir Levine,Ang Li,Huiyi Hu,Andrew Gordon Wilson,Razvan Pascanu 机构： New York University, DeepMind, Google Research 链接：https://arxiv.org/abs/2106.12772 摘要：不断地学习新的任务而不忘记不断变化的数据分布对于现实世界中的问题是至关重要的，但是对于现代的深度学习来说是非常具有挑战性的。在这项工作中，我们提出了HCL，一种混合生成判别方法，以持续学习分类。我们用一个规范化流来模拟每个任务和每个类的分布。该流程用于学习数据分布、执行分类、识别任务变化和避免遗忘，所有这些都利用了规范化流程模型唯一启用的可逆性和精确可能性。我们利用流的生成能力，通过生成重放和一种新的函数正则化技术来避免灾难性遗忘。对于任务识别，我们使用最先进的异常检测技术，基于测量模型统计的典型性。我们展示了HCL在一系列持续学习基准上的强大性能，如split MNIST、split CIFAR和SVHN-MNIST。摘要：Learning new tasks continuously without forgetting on a constantly changing data distribution is essential for real-world problems but extremely challenging for modern deep learning. In this work we propose HCL, a Hybrid generative-discriminative approach to Continual Learning for classification. We model the distribution of each task and each class with a normalizing flow. The flow is used to learn the data distribution, perform classification, identify task changes, and avoid forgetting, all leveraging the invertibility and exact likelihood which are uniquely enabled by the normalizing flow model. We use the generative capabilities of the flow to avoid catastrophic forgetting through generative replay and a novel functional regularization technique. For task identification, we use state-of-the-art anomaly detection techniques based on measuring the typicality of the model's statistics. We demonstrate the strong performance of HCL on a range of continual learning benchmarks such as split-MNIST, split-CIFAR, and SVHN-MNIST.

【11】 Machine Learning-based Orchestration of Containers: A Taxonomy and Future Directions 标题：基于机器学习的集装箱编排：分类及未来发展方向

作者：Zhiheng Zhong,Minxian Xu,Maria Alejandra Rodriguez,Chengzhong Xu,Rajkumar Buyya 机构： The University of Melbourne, Shenzhen Institute of Advanced Technology, The University of Macau, Shenzhen University Town 备注：33 pages, 10 figures 链接：https://arxiv.org/abs/2106.12739 摘要：容器化是一种轻量级的应用程序虚拟化技术，提供了高环境一致性、操作系统分发可移植性和资源隔离性。现有的主流云服务提供商在其分布式系统基础设施中普遍采用容器技术来实现应用程序的自动化管理。为了处理容器化应用程序的部署、维护、自动缩放和网络化的自动化，容器编排被认为是一个重要的研究问题。然而，云工作负载和环境的高度动态和多样性特性大大增加了编排机制的复杂性。因此，机器学习算法被用于容器编排系统的行为建模和多维性能度量的预测。这些见解可以进一步提高资源调配决策的质量，以应对复杂环境下不断变化的工作负载。本文对现有的基于机器学习的容器编排方法进行了综述。提出了详细的分类学方法，将现有的研究按其共同特征进行分类。此外，基于机器学习的容器编排技术从2016年到2021年的发展是基于目标和指标设计的。根据所提出的分类法，对所审查的技术进行了比较分析，重点介绍了它们的主要特征。最后，各种开放的研究挑战和潜在的未来方向是突出的。摘要：Containerization is a lightweight application virtualization technology, providing high environmental consistency, operating system distribution portability, and resource isolation. Existing mainstream cloud service providers have prevalently adopted container technologies in their distributed system infrastructures for automated application management. To handle the automation of deployment, maintenance, autoscaling, and networking of containerized applications, container orchestration is proposed as an essential research problem. However, the highly dynamic and diverse feature of cloud workloads and environments considerably raises the complexity of orchestration mechanisms. Machine learning algorithms are accordingly employed by container orchestration systems for behavior modelling and prediction of multi-dimensional performance metrics. Such insights could further improve the quality of resource provisioning decisions in response to the changing workloads under complex environments. In this paper, we present a comprehensive literature review of existing machine learning-based container orchestration approaches. Detailed taxonomies are proposed to classify the current researches by their common features. Moreover, the evolution of machine learning-based container orchestration technologies from the year 2016 to 2021 has been designed based on objectives and metrics. A comparative analysis of the reviewed techniques is conducted according to the proposed taxonomies, with emphasis on their key characteristics. Finally, various open research challenges and potential future directions are highlighted.

【12】 Online Verification of Deep Neural Networks under Domain or Weight Shift 标题：域或权值漂移条件下的深度神经网络在线验证

作者：Tianhao Wei,Changliu Liu 机构：Carnegie Mellon University 链接：https://arxiv.org/abs/2106.12732 摘要：虽然神经网络的应用非常广泛，但在实际应用中，正式验证神经网络的安全性和鲁棒性仍然是一个挑战。现有的方法都是在使用前对网络进行验证，仅限于相对简单的规范和固定的网络。这些方法还不能应用于具有复杂和/或动态变化的规范和网络的实际问题。为了有效地处理动态变化的规范和网络，当这些变化发生时，需要在线执行验证。然而，在线运行现有的验证算法仍然是一个挑战。我们的关键见解是，我们可以利用这些更改的时间依赖性来加速验证过程，例如，通过使用以前的验证结果热启动新的在线验证。本文提出了一种新的可扩展在线验证框架，解决了规范和/或网络动态变化的实际验证问题，即域转移和权值转移。我们提出三种技术（分支管理、扰动容限分析和增量计算）来加速深层神经网络的在线验证。实验结果表明，我们的在线验证算法比现有的验证算法快两个数量级，可以扩展到实际应用中。摘要：Although neural networks are widely used, it remains challenging to formally verify the safety and robustness of neural networks in real-world applications. Existing methods are designed to verify the network before use, which is limited to relatively simple specifications and fixed networks. These methods are not ready to be applied to real-world problems with complex and/or dynamically changing specifications and networks. To effectively handle dynamically changing specifications and networks, the verification needs to be performed online when these changes take place. However, it is still challenging to run existing verification algorithms online. Our key insight is that we can leverage the temporal dependencies of these changes to accelerate the verification process, e.g., by warm starting new online verification using previous verified results. This paper establishes a novel framework for scalable online verification to solve real-world verification problems with dynamically changing specifications and/or networks, known as domain shift and weight shift respectively. We propose three types of techniques (branch management, perturbation tolerance analysis, and incremental computation) to accelerate the online verification of deep neural networks. Experiment results show that our online verification algorithm is up to two orders of magnitude faster than existing verification algorithms, and thus can scale to real-world applications.

【13】 Meaningfully Explaining a Model's Mistakes 标题：有意义地解释模特的错误

作者：Abubakar Abid,James Zou 机构： A pathologistdownloads a pretrained model to classify histopathology im-Equal contribution 1Department of Electrical Engineering, United States 2Department ofBiomedical Data Science, Stanford University 备注：ICML workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.12723 摘要：理解和解释训练模型所犯的错误对于许多机器学习目标至关重要，例如提高鲁棒性、解决概念漂移和减少偏差。然而，这通常是一个特别的过程，需要手动查看许多测试样本上的模型错误，并猜测这些错误预测的根本原因。在这篇论文中，我们提出了一个系统的方法，概念解释分数（CES），它解释了为什么分类器在一个特定的测试样本上犯了人类可以理解的概念错误（例如，斑马因为微弱的条纹而被误分类为狗）。我们基于两个先验思想：反事实解释和概念激活向量，并在已知的预训练模型上验证了我们的方法，表明它能有效地解释模型的错误。我们还训练了新的模型与故意和已知的虚假相关性，这是成功地从一个单一的错误分类测试样本识别。CES的代码是公开的，可以很容易地应用于新的模型。摘要：Understanding and explaining the mistakes made by trained models is critical to many machine learning objectives, such as improving robustness, addressing concept drift, and mitigating biases. However, this is often an ad hoc process that involves manually looking at the model's mistakes on many test samples and guessing at the underlying reasons for those incorrect predictions. In this paper, we propose a systematic approach, conceptual explanation scores (CES), that explains why a classifier makes a mistake on a particular test sample(s) in terms of human-understandable concepts (e.g. this zebra is misclassified as a dog because of faint stripes). We base CES on two prior ideas: counterfactual explanations and concept activation vectors, and validate our approach on well-known pretrained models, showing that it explains the models' mistakes meaningfully. We also train new models with intentional and known spurious correlations, which CES successfully identifies from a single misclassified test sample. The code for CES is publicly available and can easily be applied to new models.

【14】 Long short-term relevance learning 标题：长期短期关联学习

作者：Bram van de Weg,Lars Greve,Bojana Rosic 机构： University of Twente 链接：https://arxiv.org/abs/2106.12694 摘要：为了将先验知识和测量不确定性结合到传统的长短时记忆（LSTM）神经网络中，提出了一种有效的稀疏贝叶斯训练算法。与经典的LSTM方法相比，该方法能自动确定相关的神经网络连接并进行相应的调整。由于它的灵活性，新的LSTM方法不容易过度拟合，因此可以通过使用较小的数据集来近似时间相关的解。在一个结构非线性有限元的应用中，我们证明了自调节框架不需要事先知道合适的网络结构和大小，同时以合理的计算成本保证了满意的精度。摘要：To incorporate prior knowledge as well as measurement uncertainties in the traditional long short term memory (LSTM) neural networks, an efficient sparse Bayesian training algorithm is introduced to the network architecture. The proposed scheme automatically determines relevant neural connections and adapts accordingly, in contrast to the classical LSTM solution. Due to its flexibility, the new LSTM scheme is less prone to overfitting, and hence can approximate time dependent solutions by use of a smaller data set. On a structural nonlinear finite element application we show that the self-regulating framework does not require prior knowledge of a suitable network architecture and size, while ensuring satisfying accuracy at reasonable computational cost.

【15】 Best-Case Lower Bounds in Online Learning 标题：在线学习中的最佳案例下限

作者：Cristóbal Guzmán,Nishant A. Mehta,Ali Mortazavi 机构：Department of Applied Mathematics, University of Twente, IMC, Pontificia Universidad Católica de Chile, Department of Computer Science, University of Victoria 备注：28 pages 链接：https://arxiv.org/abs/2106.12688 摘要：在线学习的许多工作都集中在对后悔的次线性上界的研究上。在这项工作中，我们开始研究在线凸优化中的最佳情况下界，其中我们限制了一个算法相对于单一的最佳行动可以获得的最大改进。这个问题的动机是为了更好地理解学习算法的自适应性。另一个动机来自公平性：众所周知，最佳情况下界有助于获得满足群体公平性概念的决策论在线学习（DTOL）算法。我们的贡献是一种通用的方法，用以提供带有时变正则化器的跟随正则化前导（FTRL）算法中的最佳情况下界，我们用它来证明最佳情况下界与现有的上界具有相同的阶数：这包括具有固定学习率、降低学习率和，永恒的方法和自适应梯度方法。与此形成鲜明对比的是，我们证明了线性化后的FTRL可以得到负的线性遗憾。最后，在具有两个专家和二元预测的DTOL中，我们充分刻画了最佳情况序列，从而更好地理解了最佳情况下界。摘要：Much of the work in online learning focuses on the study of sublinear upper bounds on the regret. In this work, we initiate the study of best-case lower bounds in online convex optimization, wherein we bound the largest improvement an algorithm can obtain relative to the single best action in hindsight. This problem is motivated by the goal of better understanding the adaptivity of a learning algorithm. Another motivation comes from fairness: it is known that best-case lower bounds are instrumental in obtaining algorithms for decision-theoretic online learning (DTOL) that satisfy a notion of group fairness. Our contributions are a general method to provide best-case lower bounds in Follow The Regularized Leader (FTRL) algorithms with time-varying regularizers, which we use to show that best-case lower bounds are of the same order as existing upper regret bounds: this includes situations with a fixed learning rate, decreasing learning rates, timeless methods, and adaptive gradient methods. In stark contrast, we show that the linearized version of FTRL can attain negative linear regret. Finally, in DTOL with two experts and binary predictions, we fully characterize the best-case sequences, which provides a finer understanding of the best-case lower bounds.

【16】 Extreme Multi-label Learning for Semantic Matching in Product Search 标题：极值多标签学习在产品搜索中的语义匹配

作者：Wei-Cheng Chang,Daniel Jiang,Hsiang-Fu Yu,Choon-Hui Teo,Jiong Zhang,Kai Zhong,Kedarnath Kolluri,Qie Hu,Nikhil Shandilya,Vyacheslav Ievgrafov,Japinder Singh,Inderjit S. Dhillon 机构：Amazon, USA 备注：Accepted in KDD 2021 Applied Data Science Track 链接：https://arxiv.org/abs/2106.12657 摘要：我们考虑产品搜索中的语义匹配问题：给定一个客户查询，从一个1亿或更多的大目录中检索所有语义相关的产品。由于存在较大的目录空间和实时延迟约束，语义匹配算法不仅需要较高的召回率，而且需要较低的延迟。传统的词汇匹配方法（如Okapi-BM25）利用倒排索引来实现快速的推理，但不能捕获查询和产品之间的行为信号。相比之下，基于嵌入的模型从客户行为数据中学习语义表示，但是由于延迟的限制，性能常常受到浅层神经编码器的限制。语义产品搜索可以看作是一个极端的多标签分类（XMC）问题，其中客户查询是输入实例，产品是输出标签。在本文中，我们使用基于树的XMC模型来改进语义产品搜索，其中推理时间复杂度是产品数的对数。我们考虑了具有n-gram特征的分层线性模型，以实现快速的实时推理。从数量上讲，我们的方法保持了每个查询1.25毫秒的低延迟，并实现了65%的性能改进Recall@100 （60.9%vs.36.8%）高于基于嵌入的竞争DSSM模型。该模型对不同阈值的权值剪枝具有很强的鲁棒性，能够灵活地满足在线部署的不同系统需求。定性地说，我们的方法可以检索到与现有产品搜索系统互补的产品，并为匹配集增加多样性。摘要：We consider the problem of semantic matching in product search: given a customer query, retrieve all semantically related products from a huge catalog of size 100 million, or more. Because of large catalog spaces and real-time latency constraints, semantic matching algorithms not only desire high recall but also need to have low latency. Conventional lexical matching approaches (e.g., Okapi-BM25) exploit inverted indices to achieve fast inference time, but fail to capture behavioral signals between queries and products. In contrast, embedding-based models learn semantic representations from customer behavior data, but the performance is often limited by shallow neural encoders due to latency constraints. Semantic product search can be viewed as an eXtreme Multi-label Classification (XMC) problem, where customer queries are input instances and products are output labels. In this paper, we aim to improve semantic product search by using tree-based XMC models where inference time complexity is logarithmic in the number of products. We consider hierarchical linear models with n-gram features for fast real-time inference. Quantitatively, our method maintains a low latency of 1.25 milliseconds per query and achieves a 65% improvement of Recall@100 (60.9% v.s. 36.8%) over a competing embedding-based DSSM model. Our model is robust to weight pruning with varying thresholds, which can flexibly meet different system requirements for online deployments. Qualitatively, our method can retrieve products that are complementary to existing product search system and add diversity to the match set.

【17】 Minimum sharpness: Scale-invariant parameter-robustness of neural networks 标题：最小锐度：神经网络的尺度不变参数鲁棒性

作者：Hikaru Ibayashi,Takuo Hamaguchi,Masaaki Imaizum 机构：The parameter-robustness is a well-studied topic in different 1Department of Computer Science, University of SouthernCalifornia, USA 2Komaba Institute for Science, University of Tokyo 备注：9 pages, accepted to ICML 2021 Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.12612 摘要：为了实现鲁棒性和防御性的神经网络，对权重参数扰动（即锐度）的鲁棒性近年来受到关注（Sun等人，2020）。然而，清晰度仍然是一个关键问题，“尺度敏感性”。在本文中，我们提出了一种新的清晰度度量，最小清晰度。众所周知，NNs有一个特定的尺度变换，它构成了函数性质完全相同的等价类，同时它们的锐度可以无限变化。我们定义我们的锐度通过一个最小化问题的等价神经网络是不变的规模变换。我们还开发了一种高效而精确的技术，使锐度易于处理，从而减少了Hessian算法的计算量。在实验中，我们观察到我们的锐度与NNs的泛化有一个有效的相关性，并且比现有的锐度度量运行的计算量少。摘要：Toward achieving robust and defensive neural networks, the robustness against the weight parameters perturbations, i.e., sharpness, attracts attention in recent years (Sun et al., 2020). However, sharpness is known to remain a critical issue, "scale-sensitivity." In this paper, we propose a novel sharpness measure, Minimum Sharpness. It is known that NNs have a specific scale transformation that constitutes equivalent classes where functional properties are completely identical, and at the same time, their sharpness could change unlimitedly. We define our sharpness through a minimization problem over the equivalent NNs being invariant to the scale transformation. We also develop an efficient and exact technique to make the sharpness tractable, which reduces the heavy computational costs involved with Hessian. In the experiment, we observed that our sharpness has a valid correlation with the generalization of NNs and runs with less computational cost than existing sharpness measures.

【18】 DP-SGD vs PATE: Which Has Less Disparate Impact on Model Accuracy? 标题：DP-SGD与Pate：哪个对模型准确性影响较小？

作者：Archit Uniyal,Rakshit Naidu,Sasikanth Kotti,Sahib Singh,Patrik Joslin Kenfack,Fatemehsadat Mireshghallah,Andrew Trask 机构： 20 16)Equal contribution 1Panjab University 2OpenMined 3ManipalInstitute of Technology 4Carnegie Mellon University 5IIT Jodh-pur 6Ford Motor Company 7Innopolis University 8University ofCalifornia, San Diego 9University of Oxford 备注：4 pages, 3 images 链接：https://arxiv.org/abs/2106.12576 摘要：差异隐私深度学习的最新进展表明，差异隐私的应用，特别是DP-SGD算法，对群体中的不同子群体具有不同的影响，这导致代表性不足的子群体（少数群体）的模型效用显著下降，与代表性很强的人相比。在这项工作中，我们的目的是比较PATE，另一种使用差异隐私的深度学习模型训练机制，与DP-SGD在公平性方面的差异。我们表明，PATE确实也有不同的影响，然而，它比DP-SGD严重得多。我们从这一观察中得出了一些见解，即在实现更好的公平性和隐私权衡方面，哪些可能是有希望的方向。摘要：Recent advances in differentially private deep learning have demonstrated that application of differential privacy, specifically the DP-SGD algorithm, has a disparate impact on different sub-groups in the population, which leads to a significantly high drop-in model utility for sub-populations that are under-represented (minorities), compared to well-represented ones. In this work, we aim to compare PATE, another mechanism for training deep learning models using differential privacy, with DP-SGD in terms of fairness. We show that PATE does have a disparate impact too, however, it is much less severe than DP-SGD. We draw insights from this observation on what might be promising directions in achieving better fairness-privacy trade-offs.

【19】 MIxBN: library for learning Bayesian networks from mixed data 标题：MIxBN：从混合数据中学习贝叶斯网络的库

作者：Anna V. Bubnova,Irina Deeva,Anna V. Kalyuzhnaya 机构：ITMO University, Saint-Petersburg, Russia 链接：https://arxiv.org/abs/2106.13194 摘要：本文描述了一个新的学习贝叶斯网络的图书馆从数据中包含离散和连续变量（混合数据）。除了经典的离散化数据学习方法外，该库还提出了一种算法，由于数据离散化会导致信息丢失，因此该算法允许对混合数据进行结构学习和参数学习，而不需要离散化。该算法基于混合MI分数函数进行结构学习，同时采用线性回归和高斯分布近似进行参数学习。该库还提供了两种枚举图结构的算法-贪婪爬山算法和进化算法。因此，该库的关键功能如下：（1）离散数据上贝叶斯网络的结构和参数学习，（2）混合数据上贝叶斯网络的结构和参数学习，使用MI混合分数函数和高斯近似，（3）在两种枚举图结构的算法之一——爬山算法和进化算法上启动学习算法。由于对混合数据表示的需求来自于实际需要，因此我们的实现的优点将在解决合成数据和真实数据集的近似和间隙恢复问题的背景下进行评估。摘要：This paper describes a new library for learning Bayesian networks from data containing discrete and continuous variables (mixed data). In addition to the classical learning methods on discretized data, this library proposes its algorithm that allows structural learning and parameters learning from mixed data without discretization since data discretization leads to information loss. This algorithm based on mixed MI score function for structural learning, and also linear regression and Gaussian distribution approximation for parameters learning. The library also offers two algorithms for enumerating graph structures - the greedy Hill-Climbing algorithm and the evolutionary algorithm. Thus the key capabilities of the proposed library are as follows: (1) structural and parameters learning of a Bayesian network on discretized data, (2) structural and parameters learning of a Bayesian network on mixed data using the MI mixed score function and Gaussian approximation, (3) launching learning algorithms on one of two algorithms for enumerating graph structures - Hill-Climbing and the evolutionary algorithm. Since the need for mixed data representation comes from practical necessity, the advantages of our implementations are evaluated in the context of solving approximation and gap recovery problems on synthetic data and real datasets.

【20】 Machine learning to tame divergent density functional approximations: a new path to consensus materials design principles 标题：驯服发散密度泛函近似的机器学习：通向一致材料设计原则的新途径

作者：Chenru Duan,Shuxin Chen,Michael G. Taylor,Fang Liu,Heather J. Kulik 机构：Department of Chemical Engineering, Massachusetts Institute of Technology, Cambridge, MA, Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 链接：https://arxiv.org/abs/2106.13109 摘要：基于密度泛函理论（DFT）和机器学习（ML）的计算虚拟高通量筛选（VHTS）加速是材料快速发现的关键。在必要的情况下，高效的基于DFT的工作流是通过单密度泛函近似（DFA）实现的。然而，对于具有挑战性电子结构的情况（例如，开壳过渡金属络合物，TMC），使用不同DFA评估的性质预计会不一致，对于这些情况，最需要快速筛选，并且通常无法获得准确的基准。为了量化DFA偏差的影响，我们引入了一种方法，从跨越多个家族和“梯级”（例如，半本地到双杂交）的23个代表性DFA和超过2000个tmc的基组中快速获得属性预测。尽管DFA的计算性质（如自旋态序和前线轨道隙）自然不同，但在所有DFA中，高线性相关性仍然存在。我们为每个DFA训练独立的ML模型，并观察特征重要性的收敛趋势；因此，这些特性提供了DFA不变的通用设计规则。我们设计了一个策略来训练由所有23个dfa提供信息的ML模型，并使用它们来预测182k个tmc的性质（例如自旋分裂能）。通过对人工神经网络预测的DFA性质的一致要求，我们改进了这些计算先导化合物与文献挖掘、实验化合物的对应性，而不是通常采用的单一DFA方法。特征分析和基于一致性的最大似然方法为克服实际离散傅立叶变换的精度限制提供了有效的替代方法。摘要：Computational virtual high-throughput screening (VHTS) with density functional theory (DFT) and machine-learning (ML)-acceleration is essential in rapid materials discovery. By necessity, efficient DFT-based workflows are carried out with a single density functional approximation (DFA). Nevertheless, properties evaluated with different DFAs can be expected to disagree for the cases with challenging electronic structure (e.g., open shell transition metal complexes, TMCs) for which rapid screening is most needed and accurate benchmarks are often unavailable. To quantify the effect of DFA bias, we introduce an approach to rapidly obtain property predictions from 23 representative DFAs spanning multiple families and "rungs" (e.g., semi-local to double hybrid) and basis sets on over 2,000 TMCs. Although computed properties (e.g., spin-state ordering and frontier orbital gap) naturally differ by DFA, high linear correlations persist across all DFAs. We train independent ML models for each DFA and observe convergent trends in feature importance; these features thus provide DFA-invariant, universal design rules. We devise a strategy to train ML models informed by all 23 DFAs and use them to predict properties (e.g., spin-splitting energy) of over 182k TMCs. By requiring consensus of the ANN-predicted DFA properties, we improve correspondence of these computational lead compounds with literature-mined, experimental compounds over the single-DFA approach typically employed. Both feature analysis and consensus-based ML provide efficient, alternative paths to overcome accuracy limitations of practical DFT.

【21】 Rate Distortion Characteristic Modeling for Neural Image Compression 标题：神经图像压缩的率失真特性建模

作者：Chuanmin Jia,Ziqing Ge,Shanshe Wang,Siwei Ma,Wen Gao 机构：†Department of Computer Science, Peking University, Beijing, China, ⋆Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China 备注：13 pages, 7 figures 链接：https://arxiv.org/abs/2106.12954 摘要：端到端优化功能提供了神经图像压缩（NIC）优越的有损压缩性能。然而，不同的模型需要经过训练才能到达率失真（R-D）空间中的不同点。本文研究了网卡的rd特性分析与建模问题。利用深度网络和统计建模方法，建立了描述网卡研发行为的基本数学函数。因此，连续比特率点可以优雅地实现利用这样的模型通过一个单一的训练网络。在这方面，我们提出了一个插件模块来学习目标比特率和自动编码器潜变量的二进制表示之间的关系。此外，我们将NIC的速率和失真特性分别建模为编码参数$\lambda$的函数。实验结果表明，该方法易于采用，与固定速率编码方法相比，具有很好的编码性能，有利于NIC的实际应用。此外，该模型还可应用于具有有限误码率的单网络NIC速率控制。摘要：End-to-end optimization capability offers neural image compression (NIC) superior lossy compression performance. However, distinct models are required to be trained to reach different points in the rate-distortion (R-D) space. In this paper, we consider the problem of R-D characteristic analysis and modeling for NIC. We make efforts to formulate the essential mathematical functions to describe the R-D behavior of NIC using deep network and statistical modeling. Thus continuous bit-rate points could be elegantly realized by leveraging such model via a single trained network. In this regard, we propose a plugin-in module to learn the relationship between the target bit-rate and the binary representation for the latent variable of auto-encoder. Furthermore, we model the rate and distortion characteristic of NIC as a function of the coding parameter $\lambda$ respectively. Our experiments show our proposed method is easy to adopt and obtains competitive coding performance with fixed-rate coding approaches, which would benefit the practical deployment of NIC. In addition, the proposed model could be applied to NIC rate control with limited bit-rate error using a single network.

【22】 Fundamental limits for learning hidden Markov model parameters 标题：学习隐马尔可夫模型参数的基本界限

作者：Kweku Abraham,Zacharie Naulet,Elisabeth Gassiat 机构：Université Paris-Saclay, CNRS, Laboratoire de Mathématiques d’Orsay, Orsay, France 链接：https://arxiv.org/abs/2106.12936 摘要：研究了可学习和不可学习隐马尔可夫模型之间的边界。HMMs是一种灵活的工具，用于对来自未知群体的依赖数据进行聚类。当簇是不同的，且隐链是遍历的，且具有满秩转移矩阵时，模型参数是可识别的。当这些条件中的任何一个失效时，就不可能确定参数。对于具有两个隐藏态的链，我们证明了匹配常数的非交感极大极小上界和下界，在该上界和下界处，参数变得可学习。摘要：We study the frontier between learnable and unlearnable hidden Markov models (HMMs). HMMs are flexible tools for clustering dependent data coming from unknown populations. The model parameters are known to be identifiable as soon as the clusters are distinct and the hidden chain is ergodic with a full rank transition matrix. In the limit as any one of these conditions fails, it becomes impossible to identify parameters. For a chain with two hidden states we prove nonasymptotic minimax upper and lower bounds, matching up to constants, which exhibit thresholds at which the parameters become learnable.

【23】 Neural ODE to model and prognose thermoacoustic instability 标题：用于模拟和预测热声不稳定性的神经微分方程组

作者：Jayesh Dhadphale,Vishnu R. Unni,Abhishek Saha,R. I. Sujith 机构：Department of Aerospace Engineering, Indian Institute of Technology Madras, Chennai, Tamil Nadu , India, Department of Mechanical and Aerospace Engineering, University of California San Diego, La Jolla, CA , United States, ), arXiv:,.,v, [physics.flu-dyn] , Jun 备注：31 pages, 12 figures 链接：https://arxiv.org/abs/2106.12758 摘要：在反应流系统中，非稳态热释放速率与燃烧室声场之间的正耦合驱动了以高振幅压力波动为特征的热声不稳定性。当底层流动为湍流时，当系统的控制参数发生变化，系统接近热声不稳定时，声压振荡与放热率振荡同步。因此，在湍流燃烧室热声不稳定性发生的过程中，系统动力学通过一个间歇状态从混沌振荡过渡到周期振荡。传统的热声系统建模方法是将非定常热源模型与声学子系统耦合，每个子系统独立估计。非定常热源火焰对声起伏的响应是通过引入外部非定常强迫来描述的。这就需要一个强大的激励模块来获得火焰对声扰动的非线性响应。我们引入了一个神经常微分方程（neural ODE）框架来模拟整个热声系统，而不是描述单个子系统。热声系统的神经ODE模型使用热释放速率和压力波动的时间序列，在不引入任何外部扰动的情况下同时测量，来模拟它们的耦合相互作用。此外，我们使用神经常微分方程的参数来定义一个异常度量，它表示系统动力学接近极限环振荡，从而为热声不稳定性的发生提供早期预警信号。摘要：In reacting flow systems, thermoacoustic instability characterized by high amplitude pressure fluctuations, is driven by a positive coupling between the unsteady heat release rate and the acoustic field of the combustor. When the underlying flow is turbulent, as a control parameter of the system is varied and the system approach thermoacoustic instability, the acoustic pressure oscillations synchronize with heat release rate oscillations. Consequently, during the onset of thermoacoustic instability in turbulent combustors, the system dynamics transition from chaotic oscillations to periodic oscillations via a state of intermittency. Thermoacoustic systems are traditionally modeled by coupling the model for the unsteady heat source and the acoustic subsystem, each estimated independently. The response of the unsteady heat source, the flame, to acoustic fluctuations are characterized by introducing external unsteady forcing. This necessitates a powerful excitation module to obtain the nonlinear response of the flame to acoustic perturbations. Instead of characterizing individual subsystems, we introduce a neural ordinary differential equation (neural ODE) framework to model the thermoacoustic system as a whole. The neural ODE model for the thermoacoustic system uses time series of the heat release rate and the pressure fluctuations, measured simultaneously without introducing any external perturbations, to model their coupled interaction. Further, we use the parameters of neural ODE to define an anomaly measure that represents the proximity of system dynamics to limit cycle oscillations and thus provide an early warning signal for the onset of thermoacoustic instability.

【24】 Provably efficient machine learning for quantum many-body problems 标题：量子多体问题的可证明高效机器学习

作者：Hsin-Yuan Huang,Richard Kueng,Giacomo Torlai,Victor V. Albert,John Preskill 机构：Institute for Quantum Information and Matter and, Department of Computing and Mathematical Sciences, Caltech, Pasadena, CA, USA, Institute for Integrated Circuits, Johannes Kepler University Linz, Austria, AWS Center for Quantum Computing, Pasadena, CA, USA 备注：10 pages, 12 figures + 58 page appendix 链接：https://arxiv.org/abs/2106.12627 摘要：经典机器学习（ML）为解决物理和化学中具有挑战性的量子多体问题提供了一种潜在的强有力的方法。然而，ML相对于传统方法的优势还没有被确定。在这项工作中，我们证明了经典的ML算法可以有效地预测有限维空间中有间隙哈密顿量的基态性质，在学习了测量其他哈密顿量在同一量子相的数据之后。相反，在广泛接受的复杂性理论假设下，不从数据中学习的经典算法无法实现相同的保证。我们还证明了经典的ML算法可以有效地对物质的大范围量子相进行分类。我们的论点基于经典阴影的概念，经典阴影是对多体量子态的一种简洁的经典描述，可以在可行的量子实验中构造并用于预测该态的许多性质。大量的数值实验证实了我们在各种情况下的理论结果，包括里德堡原子系统、二维随机海森堡模型、对称保护的拓扑相和拓扑有序相。摘要：Classical machine learning (ML) provides a potentially powerful approach to solving challenging quantum many-body problems in physics and chemistry. However, the advantages of ML over more traditional methods have not been firmly established. In this work, we prove that classical ML algorithms can efficiently predict ground state properties of gapped Hamiltonians in finite spatial dimensions, after learning from data obtained by measuring other Hamiltonians in the same quantum phase of matter. In contrast, under widely accepted complexity theory assumptions, classical algorithms that do not learn from data cannot achieve the same guarantee. We also prove that classical ML algorithms can efficiently classify a wide range of quantum phases of matter. Our arguments are based on the concept of a classical shadow, a succinct classical description of a many-body quantum state that can be constructed in feasible quantum experiments and be used to predict many properties of the state. Extensive numerical experiments corroborate our theoretical results in a variety of scenarios, including Rydberg atom systems, 2D random Heisenberg models, symmetry-protected topological phases, and topologically ordered phases.

其他(19篇)

【1】 Efficient Tensor Contraction via Fast Count Sketch 标题：基于快速计数草图的高效张量压缩

作者：Xingyu Cao,Jiani Liu 机构：School of Information and Communication Engineering, University of Electronic, Science and Technology of China (UESTC), Chengdu, China. 链接：https://arxiv.org/abs/2106.13062 摘要：素描使用随机散列函数进行降维和加速。现有的素描方法，如计数素描（CS）、张量素描（TS）和高阶计数素描（HCS），在一些基于张量的应用中要么精度低，要么速度慢。本文提出的快速计数草图（FCS）将基于CS的多个较短散列函数应用于输入张量的向量形式，由于可以更充分地保留输入张量的空间信息，因此比TS更精确。当输入张量允许CANDECOMP/PARAFAC分解（CPD）时，FCS可以利用快速傅立叶变换加速CS和HCS，其计算复杂度与低阶张量的TS近似相同。通过CPD、张量回归网络压缩和Kronecker积压缩验证了FCS的有效性。实验结果表明，该算法在逼近精度和计算效率方面都有较好的性能。摘要：Sketching uses randomized Hash functions for dimensionality reduction and acceleration. The existing sketching methods, such as count sketch (CS), tensor sketch (TS), and higher-order count sketch (HCS), either suffer from low accuracy or slow speed in some tensor based applications. In this paper, the proposed fast count sketch (FCS) applies multiple shorter Hash functions based CS to the vector form of the input tensor, which is more accurate than TS since the spatial information of the input tensor can be preserved more sufficiently. When the input tensor admits CANDECOMP/PARAFAC decomposition (CPD), FCS can accelerate CS and HCS by using fast Fourier transform, which exhibits a computational complexity asymptotically identical to TS for low-order tensors. The effectiveness of FCS is validated by CPD, tensor regression network compression, and Kronecker product compression. Experimental results show its superior performance in terms of approximation accuracy and computational efficiency.

【2】 Mix and Mask Actor-Critic Methods 标题：混搭和面具演员-批评方法

作者：Dom Huh 机构：Department of Computer Science, University of California, Davis, CA, USA 链接：https://arxiv.org/abs/2106.13037 摘要：参与者-批评家方法的共享特征空间旨在捕获策略和值函数所使用的广义潜在表示，以期获得更稳定和更有效的样本优化。然而，这种范式在实践中提出了许多挑战，因为生成共享表示的参数必须学习两个不同的目标，从而导致相互竞争的更新和学习扰动。在本文中，我们提出了一个新的特征共享框架来解决这些困难，通过引入混合和掩码机制和分布式规模化技术。这些机制动态地耦合和解耦策略函数和价值函数之间可变的关联潜在特征，而分布尺度化则用概率的观点标准化了这两个目标。从我们的实验结果来看，与使用独立网络和具有共享主干的网络的替代方法相比，我们证明了显著的性能改进。摘要：Shared feature spaces for actor-critic methods aims to capture generalized latent representations to be used by the policy and value function with the hopes for a more stable and sample-efficient optimization. However, such a paradigm present a number of challenges in practice, as parameters generating a shared representation must learn off two distinct objectives, resulting in competing updates and learning perturbations. In this paper, we present a novel feature-sharing framework to address these difficulties by introducing the mix and mask mechanisms and the distributional scalarization technique. These mechanisms behaves dynamically to couple and decouple connected latent features variably between the policy and value function, while the distributional scalarization standardizes the two objectives using a probabilistic standpoint. From our experimental results, we demonstrate significant performance improvements compared to alternative methods using separate networks and networks with a shared backbone.

【3】 Symmetric Wasserstein Autoencoders 标题：对称Wasserstein自动编码器

作者：Sun Sun,Hongyu Guo 机构：National Research Council Canada, Ottawa, ON., K,A ,R, Canada 备注：Accepted by UAI2021 链接：https://arxiv.org/abs/2106.13024 摘要：利用优化传输的框架，我们引入了一个新的具有可学习先验的生成式自动编码器家族，称为对称Wasserstein自动编码器（SWAEs）。我们提出对称匹配的联合分布的观测数据和潜在的代表性所引起的编码器和解码器。由此产生的算法联合优化了数据和潜在空间中的建模损失，数据空间中的损失导致了去噪效果。该算法通过对数据的对称处理和隐式表示，隐式地保留了数据在隐式空间中的局部结构。为了进一步提高潜在表征的质量，我们在目标中加入了重建损失，这对生成和重建都有很大的好处。我们的经验表明，在分类、重构和生成方面，SWAEs优于最先进的生成式自动编码器。摘要：Leveraging the framework of Optimal Transport, we introduce a new family of generative autoencoders with a learnable prior, called Symmetric Wasserstein Autoencoders (SWAEs). We propose to symmetrically match the joint distributions of the observed data and the latent representation induced by the encoder and the decoder. The resulting algorithm jointly optimizes the modelling losses in both the data and the latent spaces with the loss in the data space leading to the denoising effect. With the symmetric treatment of the data and the latent representation, the algorithm implicitly preserves the local structure of the data in the latent space. To further improve the quality of the latent representation, we incorporate a reconstruction loss into the objective, which significantly benefits both the generation and reconstruction. We empirically show the superior performance of SWAEs over the state-of-the-art generative autoencoders in terms of classification, reconstruction, and generation.

【4】 Improved Regret Bounds for Tracking Experts with Memory 标题：改进了有记忆跟踪专家的后悔界限

作者：James Robinson,Mark Herbster 机构：Department of Computer Science, University College London, United Kingdom 链接：https://arxiv.org/abs/2106.13021 摘要：在Bousquet和Warmuth意义下，我们讨论了在具有长期记忆保证的非平稳环境中用专家建议进行序列预测的问题[4]。我们给出了一个线性时间算法，它改进了最著名的遗憾界[26]。该算法结合了相对熵投影步骤。与以前的权重共享方法相比，这种预测是有利的，因为权重更新可能伴随隐含成本，例如投资组合优化。我们给出了一个算法来计算这个投影步长在线性时间，这可能是独立的兴趣。摘要：We address the problem of sequential prediction with expert advice in a non-stationary environment with long-term memory guarantees in the sense of Bousquet and Warmuth [4]. We give a linear-time algorithm that improves on the best known regret bounds [26]. This algorithm incorporates a relative entropy projection step. This projection is advantageous over previous weight-sharing approaches in that weight updates may come with implicit costs as in for example portfolio optimization. We give an algorithm to compute this projection step in linear time, which may be of independent interest.

【5】 A Fully Problem-Dependent Regret Lower Bound for Finite-Horizon MDPs 标题：有限视界MDP完全问题相关的后悔下界

作者：Andrea Tirinzoni,Matteo Pirotta,Alessandro Lazaric 机构：Inria Lille, Facebook AI Research 链接：https://arxiv.org/abs/2106.13013 摘要：我们推导了有限时域表马尔可夫决策过程（MDPs）中遗憾最小化的一个新的依赖于问题的渐近下界。虽然与先前的工作（例如遍历MDP）类似，下界是优化问题的解，但我们的推导揭示了需要对状态-动作对上的访问分布施加额外的约束，以明确说明MDP的动力学。我们通过一系列例子来说明不同的mdp如何具有显著不同的复杂性，从而提供了我们的下限的特征。1）我们首先考虑一个“困难的”MDP实例，与经典分析相比，基于动力学的新约束导致更大的下限（即更大的遗憾）。2）然后，我们将显示我们的下限恢复先前针对特定MDP实例导出的结果。3）最后，我们证明，在某些“简单”mdp中，下界比一般情况下小得多，并且它根本不以最小作用间隙为尺度。我们证明了最后一个结果是可以实现的（高达$poly（H）$项，其中$H$是视界），通过为乐观算法提供基于策略差距的遗憾上界。摘要：We derive a novel asymptotic problem-dependent lower-bound for regret minimization in finite-horizon tabular Markov Decision Processes (MDPs). While, similar to prior work (e.g., for ergodic MDPs), the lower-bound is the solution to an optimization problem, our derivation reveals the need for an additional constraint on the visitation distribution over state-action pairs that explicitly accounts for the dynamics of the MDP. We provide a characterization of our lower-bound through a series of examples illustrating how different MDPs may have significantly different complexity. 1) We first consider a "difficult" MDP instance, where the novel constraint based on the dynamics leads to a larger lower-bound (i.e., a larger regret) compared to the classical analysis. 2) We then show that our lower-bound recovers results previously derived for specific MDP instances. 3) Finally, we show that, in certain "simple" MDPs, the lower bound is considerably smaller than in the general case and it does not scale with the minimum action gap at all. We show that this last result is attainable (up to $poly(H)$ terms, where $H$ is the horizon) by providing a regret upper-bound based on policy gaps for an optimistic algorithm.

【6】 L'Apprentissage Automatique dans la planification et le contr{ô}le de la production : un {é}tat de l'art 标题：L‘Apprentissage Automatique dans la Platalization et le Contrl{ô}le de la Production：un{é}tat de l’art

作者：Juan Pablo Usuga Cadavid,Samir Lamouri,Bernard Grabot,Arnaud Fortin 机构： LAMIH, ARTS ET METIERS PARISTECH, Boulevard de l’Hôpital, Paris, France, LGP, ENIT, Avenue d’Azereix, Tarbes, France, IFAKT FRANCE SAS, Esplanade compans caffarelli, Toulouse, France, L’Apprentissage Automatique dans la planification et 备注：None 链接：https://arxiv.org/abs/2106.12916 摘要：正确的生产计划和控制（PPC）是在竞争对手中占据优势、降低成本和尊重交货期的资本。关于PPC，机器学习（ML）为基于数据的智能决策提供了新的机会。因此，本通讯提供了一个初步的系统性审查的出版物ML应用于PPC。本研究的研究目标有两个：一是确定在PPC中应用ML的技术和工具，二是回顾近年来工业4.0（I4.0）的特点。关于第二个目标，在分析框架中使用了I4.0的七个特性，其中两个是作者提出的。此外，确定了科学文献中ML辅助PPC的寻址域。最后，对研究结果进行了分析，并指出了有待进一步研究的问题。摘要：Proper Production Planning and Control (PPC) is capital to have an edge over competitors, reduce costs and respect delivery dates. With regard to PPC, Machine Learning (ML) provides new opportunities to make intelligent decisions based on data. Therefore, this communication provides an initial systematic review of publications on ML applied in PPC. The research objective of this study is twofold: firstly, it aims to identify techniques and tools allowing to apply ML in PPC, and secondly, it reviews the characteristics of Industry 4.0 (I4.0) in recent research papers. Concerning the second objective, seven characteristics of I4.0 are used in the analysis framework, from which two of them are proposed by the authors. Additionally, the addressed domains of ML-aided PPC in scientific literature are identified. Finally, results are analyzed and gaps that may motivate further research are highlighted.

【7】 Numerical influence of ReLU'(0) on backpropagation 标题：RELU‘(0)对反向传播的数值影响

作者：David Bertoin,Jérôme Bolte,Sébastien Gerchinovitz,Edouard Pauwels 机构：IRT Saint-Exup´ery, ISAE-SUPAERO, ANITI, Toulouse, France, J´erˆome Bolte, Toulouse School of Economics, Universit´e Toulouse , Capitole, S´ebastien Gerchinovitz, IMT, Universit´e Paul Sabatier, CNRS, IRIT, Universit´e Paul Sabatier 链接：https://arxiv.org/abs/2106.12915 摘要：理论上，神经网络在[0，1]中ReLU（0）的选择对反向传播和训练的影响可以忽略不计。然而，在现实世界中，32位的默认精度加上深度学习问题的规模，使其成为训练方法的超参数。我们研究了ReLU（0）的值在各种网络（完全连接、VGG、ResNet）和数据集（MNIST、CIFAR10、SVHN）上对于几个精度级别（16、32、64位）的重要性。我们观察到，在32位精度的情况下，大约有一半的时间会出现反向传播输出的显著变化。这种效应以两倍精度消失，而它在16位时是系统的。对于香草SGD训练，选择ReLU（0）=0似乎是最有效的。我们还证明了作为批处理范数或ADAM的修复方法倾向于缓冲ReLU（0）值的影响。总的来说，我们要传达的信息是，非光滑问题的算法微分潜在地隐藏了可以有利地调整的参数。摘要：In theory, the choice of ReLU (0) in [0, 1] for a neural network has a negligible influence both on backpropagation and training. Yet, in the real world, 32 bits default precision combined with the size of deep learning problems makes it a hyperparameter of training methods. We investigate the importance of the value of ReLU (0) for several precision levels (16, 32, 64 bits), on various networks (fully connected, VGG, ResNet) and datasets (MNIST, CIFAR10, SVHN). We observe considerable variations of backpropagation outputs which occur around half of the time in 32 bits precision. The effect disappears with double precision, while it is systematic at 16 bits. For vanilla SGD training, the choice ReLU (0) = 0 seems to be the most efficient. We also evidence that reconditioning approaches as batch-norm or ADAM tend to buffer the influence of ReLU (0)'s value. Overall, the message we want to convey is that algorithmic differentiation of nonsmooth problems potentially hides parameters that could be tuned advantageously.

【8】 Non-Autoregressive TTS with Explicit Duration Modelling for Low-Resource Highly Expressive Speech 标题：低资源高表现性语音的显式持续时间非自回归TTS建模

作者：Raahil Shah,Kamil Pokora,Abdelhamid Ezzerg,Viacheslav Klimkov,Goeric Huybrechts,Bartosz Putrycz,Daniel Korzekwa,Thomas Merritt 机构：Amazon Text-to-Speech Research 备注：6 pages, 5 figures. Accepted to Speech Synthesis Workshop (SSW) 2021 链接：https://arxiv.org/abs/2106.12896 摘要：虽然最近的神经文本到语音（TTS）方法产生高质量的语音，但它们通常需要目标说话人的大量录音。在以前的工作中，提出了一种三步方法来生成高质量的TTS，同时大大减少了训练所需的数据量。然而，我们观察到，当使用这种方法时，高表达声音的自然度水平会出现天花板效应。在本文中，我们提出了一种方法来建立高表现力的TTS语音，只需15分钟的语音数据从目标发言者。与目前最先进的方法相比，我们提出的改进方案在语音自然度和说话人相似性方面与录音的差距分别缩小了23.3%和16.3%。此外，我们仅使用15分钟的目标说话人数据来匹配基于Tacotron2的完整数据模型（约10小时）的自然度和说话人相似性，而在30分钟或更长时间内，我们的性能明显优于它。提出了以下改进：1）由基于注意的自回归TTS模型改为非自回归模型，用外部持续时间模型代替注意；2）增加一个基于条件生成对抗网络（cGAN）的微调步骤。摘要：Whilst recent neural text-to-speech (TTS) approaches produce high-quality speech, they typically require a large amount of recordings from the target speaker. In previous work, a 3-step method was proposed to generate high-quality TTS while greatly reducing the amount of data required for training. However, we have observed a ceiling effect in the level of naturalness achievable for highly expressive voices when using this approach. In this paper, we present a method for building highly expressive TTS voices with as little as 15 minutes of speech data from the target speaker. Compared to the current state-of-the-art approach, our proposed improvements close the gap to recordings by 23.3% for naturalness of speech and by 16.3% for speaker similarity. Further, we match the naturalness and speaker similarity of a Tacotron2-based full-data (~10 hours) model using only 15 minutes of target speaker data, whereas with 30 minutes or more, we significantly outperform it. The following improvements are proposed: 1) changing from an autoregressive, attention-based TTS model to a non-autoregressive model replacing attention with an external duration model and 2) an additional Conditional Generative Adversarial Network (cGAN) based fine-tuning step.

【9】 Evaluation of Saliency-based Explainability Method 标题：基于显著性的可解释性评价方法

作者：Sam Zabdiel Sunder Samuel,Vidhya Kamakshi,Namrata Lodhi,Narayanan C Krishnan 机构： 1Department of Computer Science and Engineering, In-dian Institute of Technology Ropar 备注：Accepted at the ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI, 2021 链接：https://arxiv.org/abs/2106.12773 摘要：一类特殊的可解释人工智能（XAI）方法提供显著性映射来突出图像的一部分，卷积神经网络（CNN）模型将图像分类作为解释其工作原理的一种方法。这些方法为用户理解CNNs的预测提供了一种直观的方法。除了定量的计算测试，绝大多数强调这些方法有价值的证据都是轶事。考虑到人类将是这些方法的最终用户，我们设计了三个人体实验，通过这些实验来评估这些基于显著性的解释方法的有效性。摘要：A particular class of Explainable AI (XAI) methods provide saliency maps to highlight part of the image a Convolutional Neural Network (CNN) model looks at to classify the image as a way to explain its working. These methods provide an intuitive way for users to understand predictions made by CNNs. Other than quantitative computational tests, the vast majority of evidence to highlight that the methods are valuable is anecdotal. Given that humans would be the end-users of such methods, we devise three human subject experiments through which we gauge the effectiveness of these saliency-based explainability methods.

【10】 TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration 标题：TagRuler：跨度级数据演示编程的交互式工具

作者：Dongjin Choi,Sara Evensen,Çağatay Demiralp,Estevam Hruschka 机构：Georgia Institute of Techonology, USA, Megagon Labs, USA, Sigma Computing, USA 备注：WWW'21 Demo 链接：https://arxiv.org/abs/2106.12767 摘要：尽管机器学习研究领域发展迅速，但收集高质量的标签用于监督学习仍然是许多应用的瓶颈。由于NLP任务的最新模型变得越来越深、越来越复杂，甚至微调所需的训练数据量也经常增加，这一事实加剧了这一困难。弱监督方法，包括数据编程，解决了这一问题，并通过使用噪声标签源进行监督来降低标签收集的成本。然而，直到最近，数据编程只对知道如何编程的用户开放。为了弥补这一差距，提出了基于演示框架的数据编程，以便于基于领域专家标注的几个实例自动创建标注函数。该框架已经成功地生成了用于文档分类的高精度标签模型。在这项工作中，我们将DPBD框架扩展到跨级别的注释任务，可以说是最耗时的NLP标记任务之一。我们构建了一个新的工具TagRuler，它使得注释者不需要编程就可以轻松地构建跨级别的标记函数，并鼓励他们在不同的标记模型和主动学习策略之间进行权衡。我们的经验表明，注释者可以获得更高的F1分数使用所提出的工具相比，手动标记不同跨度水平的注释任务。摘要：Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models for NLP tasks are becoming deeper and more complex, often increasing the amount of training data required even for fine-tuning. Weak supervision methods, including data programming, address this problem and reduce the cost of label collection by using noisy label sources for supervision. However, until recently, data programming was only accessible to users who knew how to program. To bridge this gap, the Data Programming by Demonstration framework was proposed to facilitate the automatic creation of labeling functions based on a few examples labeled by a domain expert. This framework has proven successful for generating high-accuracy labeling models for document classification. In this work, we extend the DPBD framework to span-level annotation tasks, arguably one of the most time-consuming NLP labeling tasks. We built a novel tool, TagRuler, that makes it easy for annotators to build span-level labeling functions without programming and encourages them to explore trade-offs between different labeling models and active learning strategies. We empirically demonstrated that an annotator could achieve a higher F1 score using the proposed tool compared to manual labeling for different span-level annotation tasks.

【11】 Partial Maximum Correntropy Regression for Robust Trajectory Decoding from Noisy Epidural Electrocorticographic Signals 标题：偏最大相关熵回归在含噪硬膜外皮层脑电信号鲁棒轨迹解码中的应用

作者：Yuanhao Li,Badong Chen,Gang Wang,Natsue Yoshimura,Yasuharu Koike 机构： Xi’an Jiaotong University, Gang Wang is with the Key Laboratory of Biomedical InformationEngineering of Ministry of Education, School of Life Science and Tech-nology, Institute of Biomedical Engineering 链接：https://arxiv.org/abs/2106.13086 摘要：偏最小二乘回归（PLSR）算法在从脑-机接口相互关联的脑记录中预测连续变量方面表现出卓越的能力，最近成功地实现了从猕猴硬膜外皮层电图到三维连续手轨迹的预测。然而，PLSR本质上是基于最小二乘准则的，因此对复杂的噪声不具有鲁棒性。本研究的目的是提出一个稳健版本的PLSR。为此，采用最大相关熵准则构造了PLSR的一个新的鲁棒变量，即部分最大相关熵回归（PMCR）。利用半二次优化技术计算鲁棒潜变量。我们在一个综合的例子和公共神经病学数据集上评估了建议的预防性维护措施。与传统的PLSR和最新的PLSR相比，PMCR在训练集受到污染的情况下，在三个不同的性能指标上实现了优越的预测能力。提出的PMCR是一种有效的脑噪声测量鲁棒性译码方法，它可以减少不利噪声对译码性能的影响，从而提高脑-机接口的译码鲁棒性。摘要：The Partial Least Square Regression (PLSR) algorithm exhibits exceptional competence for predicting continuous variables from inter-correlated brain recordings in brain-computer interfaces, which achieved successful prediction from epidural electrocorticography of macaques to three-dimensional continuous hand trajectories recently. Nevertheless, PLSR is in essence formulated based on the least square criterion, thus, being non-robust with respect to complicated noises consequently. The aim of the present study is to propose a robust version of PLSR. To this end, the maximum correntropy criterion is adopted to structure a new robust variant of PLSR, namely Partial Maximum Correntropy Regression (PMCR). Half-quadratic optimization technique is utilized to calculate the robust latent variables. We assess the proposed PMCR on a synthetic example and the public Neurotycho dataset. Compared with the conventional PLSR and the state-of-the-art variant, PMCR realized superior prediction competence on three different performance indicators with contaminated training set. The proposed PMCR was demonstrated as an effective approach for robust decoding from noisy brain measurements, which could reduce the performance degradation resulting from adverse noises, thus, improving the decoding robustness of brain-computer interfaces.

【12】 Stochastic Projective Splitting: Solving Saddle-Point Problems with Multiple Regularizers 标题：随机投影分裂：求解多正则鞍点问题

作者：Patrick R. Johnstone,Jonathan Eckstein,Thomas Flynn,Shinjae Yoo 机构：Computational Science Initiative, Brookhaven National Laboratory, Department of Management Science and Information Systems, Rutgers University 链接：https://arxiv.org/abs/2106.13067 摘要：我们提出了一个新的，随机变量的投影分裂（PS）家庭算法单调包含问题。它可以解决最小最大和非合作博弈公式中出现的应用，如鲁棒ML没有收敛问题与梯度下降上升，目前事实上的标准方法在这种情况下。我们的建议是第一个版本的PS能够使用随机（相对于确定性）梯度预言。这也是第一个通过投影和近端算子来处理多约束和非光滑正则化问题的求解极小极大对策的随机方法。最后对一个分布稳健的稀疏logistic回归问题进行了数值实验。摘要：We present a new, stochastic variant of the projective splitting (PS) family of algorithms for monotone inclusion problems. It can solve min-max and noncooperative game formulations arising in applications such as robust ML without the convergence issues associated with gradient descent-ascent, the current de facto standard approach in such situations. Our proposal is the first version of PS able to use stochastic (as opposed to deterministic) gradient oracles. It is also the first stochastic method that can solve min-max games while easily handling multiple constraints and nonsmooth regularizers via projection and proximal operators. We close with numerical experiments on a distributionally robust sparse logistic regression problem.

【13】 Quantization Aware Training, ERNIE and Kurtosis Regularizer: a short empirical study 标题：量化意识训练、厄尼和峰度调节器：一项简短的实证研究

作者：Andrea Zanetti 备注：13 pages, 8 figures 链接：https://arxiv.org/abs/2106.13035 摘要：像Ernie或Bert这样的预先训练的语言模型目前在许多应用程序中使用。这些模型带有一组预先训练好的权值，这些权值通常是在无监督/自监督模式下对大量数据获得的。在那之后，他们将在特定的任务上进行微调。然后应用程序使用这些模型进行推理，通常会应用一些附加约束，如低功耗预算或输入和输出之间的低延迟。满足这些推理设置附加要求的主要途径是使用低精度计算（例如INT8而不是FP32），但这会导致模型的功能性能（例如精度）恶化。已经开发了一些方法来解决这个问题，并且超越了PTO（训练后量化）的限制，更具体地说是QAT（量化感知训练，参见[4]）是干扰训练过程以使其在训练期间受到量化阶段的影响（或简单地受到干扰）的过程。除了QAT之外，最近intelhabana实验室还提出了一种更直接的方法，使训练结果对使用正则化器的后续量化更具鲁棒性，从而改变了驱动训练过程的损失函数。但他们的建议并不能像厄尼这样的预先训练过的模特开箱即用。在这篇短文中，我们将说明为什么不会发生这种情况（对于Ernie的情况），并提出一种非常基本的方法来处理它，同时分享一些初始结果（最终INT8精度的提高），这些结果可能会引起愿意在低精度区域的应用中使用Ernie的从业者的兴趣。摘要：Pre-trained language models like Ernie or Bert are currently used in many applications. These models come with a set of pre-trained weights typically obtained in unsupervised/self-supervised modality on a huge amount of data. After that, they are fine-tuned on a specific task. Applications then use these models for inference, and often some additional constraints apply, like low power-budget or low latency between input and output. The main avenue to meet these additional requirements for the inference settings, is to use low precision computation (e.g. INT8 rather than FP32), but this comes with a cost of deteriorating the functional performance (e.g. accuracy) of the model. Some approaches have been developed to tackle the problem and go beyond the limitations of the PTO (Post-Training Quantization), more specifically the QAT (Quantization Aware Training, see [4]) is a procedure that interferes with the training process in order to make it affected (or simply disturbed) by the quantization phase during the training itself. Besides QAT, recently Intel-Habana Labs have proposed an additional and more direct way to make the training results more robust to subsequent quantization which uses a regularizer, therefore changing the loss function that drives the training procedure. But their proposal does not work out-of-the-box for pre-trained models like Ernie, for example. In this short paper we show why this is not happening (for the Ernie case) and we propose a very basic way to deal with it, sharing as well some initial results (increase in final INT8 accuracy) that might be of interest to practitioners willing to use Ernie in their applications, in low precision regime.

【14】 GNMR: A provable one-line algorithm for low rank matrix recovery 标题：GNMR：一种可证明的低秩矩阵恢复单行算法

作者：Pini Zilber,Boaz Nadler 机构： Weizmann Institute of Science(pini 链接：https://arxiv.org/abs/2106.12933 摘要：低秩矩阵恢复问题，包括矩阵完备问题和矩阵检测问题，有着广泛的应用。在这项工作中，我们提出了GNMR——一种基于高斯-牛顿线性化的极简单的低秩矩阵恢复迭代算法。在理论上，我们推导了GNMR在矩阵传感和矩阵完成两种情况下的恢复保证。GNMR的一个关键特性是它隐式地保持因子矩阵在整个迭代过程中近似平衡。在经验方面，我们证明了对于均匀抽样的矩阵完备问题，GNMR的性能优于几种常用的方法，特别是当观测值很少接近信息极限时。摘要：Low rank matrix recovery problems, including matrix completion and matrix sensing, appear in a broad range of applications. In this work we present GNMR -- an extremely simple iterative algorithm for low rank matrix recovery, based on a Gauss-Newton linearization. On the theoretical front, we derive recovery guarantees for GNMR in both the matrix sensing and matrix completion settings. A key property of GNMR is that it implicitly keeps the factor matrices approximately balanced throughout its iterations. On the empirical front, we show that for matrix completion with uniform sampling, GNMR performs better than several popular methods, especially when given very few observations close to the information limit.

【15】 Lettuce: PyTorch-based Lattice Boltzmann Framework 标题：生菜：基于PyTorch的格子Boltzmann框架

作者：Mario Christopher Bedrunka,Dominik Wilde,Martin Kliemank,Dirk Reith,Holger Foysi,Andreas Krämer 机构： Department of Mechanical Engineering, University of Siegen, Paul-Bonatz-Straße, -, Siegen-Weidenau, Germany, Institute of Technology, Resource and Energy-efficient Engineering (TREE), Bonn-Rhein-Sieg University of Applied Sciences, Grantham-Allee , Sankt 链接：https://arxiv.org/abs/2106.12929 摘要：格子Boltzmann方法（LBM）是计算流体力学等领域的一种有效模拟技术。它基于笛卡尔网格上的简单流碰撞算法，很容易与现代机器学习体系结构兼容。虽然人们越来越清楚地认识到，深度学习可以为经典模拟技术提供决定性的刺激，但最近的研究还没有解决机器学习和LBM之间可能存在的联系。在这里，我们介绍莴苣，一个基于PyTorch的LBM代码，有三个目标。莴苣能够用最少的源代码实现GPU加速计算，促进LBM模型的快速原型制作，并能够将LBM模拟与Pytork的深度学习和自动微分功能集成。作为将机器学习与LBM相结合的概念证明，建立了一个神经碰撞模型，在双周期剪切层上训练，然后将其转换为不同的流动，即衰减湍流。我们还举例说明了PyTorch的自动微分框架在流量控制和优化方面的额外优势。为此，在不进一步限制速度场的情况下，维持了强迫各向同性湍流的频谱。源代码可以从https://github.com/lettucecfd/lettuce. 摘要：The lattice Boltzmann method (LBM) is an efficient simulation technique for computational fluid mechanics and beyond. It is based on a simple stream-and-collide algorithm on Cartesian grids, which is easily compatible with modern machine learning architectures. While it is becoming increasingly clear that deep learning can provide a decisive stimulus for classical simulation techniques, recent studies have not addressed possible connections between machine learning and LBM. Here, we introduce Lettuce, a PyTorch-based LBM code with a threefold aim. Lettuce enables GPU accelerated calculations with minimal source code, facilitates rapid prototyping of LBM models, and enables integrating LBM simulations with PyTorch's deep learning and automatic differentiation facility. As a proof of concept for combining machine learning with the LBM, a neural collision model is developed, trained on a doubly periodic shear layer and then transferred to a different flow, a decaying turbulence. We also exemplify the added benefit of PyTorch's automatic differentiation framework in flow control and optimization. To this end, the spectrum of a forced isotropic turbulence is maintained without further constraining the velocity field. The source code is freely available from https://github.com/lettucecfd/lettuce.

【16】 Accelerating variational quantum algorithms with multiple quantum processors 标题：用多量子处理器加速变分量子算法

作者：Yuxuan Du,Yang Qian,Dacheng Tao 机构：JD Explore Academy, School of Computer Science, The University of Sydney 链接：https://arxiv.org/abs/2106.12819 摘要：变分量子算法（VQAs）具有利用近场量子机获得比经典方法更大计算优势的潜力。尽管如此，现代vqa仍面临着繁重的计算开销，这是由于传统的使用单独的量子处理器来处理大量数据而造成的。因此，为了更好地发挥VQAs的优势，提高其运行效率具有重要意义。在这里，我们设计了一个有效的分布式优化方案，称为QUDIO，来解决这个问题。具体来说，在QUDIO中，一个经典的中央服务器将学习问题划分为多个子问题，并将它们分配给多个本地节点，每个节点由一个量子处理器和一个经典的优化器组成。在训练过程中，所有局部节点进行并行优化，经典服务器及时同步局部节点间的优化信息。在此基础上，我们证明了QUDIO在理想情况下的次线性收敛速度与全局迭代次数有关，而系统的不完全性可能导致发散优化。在标准基准测试上的数值结果表明，QUDIO可以令人惊讶地实现相对于本地节点数的超线性运行时加速。我们的方案可以很容易地与其他基于VQAs的先进技术相结合，以缩小现有技术与具有量子优势的应用之间的差距。摘要：Variational quantum algorithms (VQAs) have the potential of utilizing near-term quantum machines to gain certain computational advantages over classical methods. Nevertheless, modern VQAs suffer from cumbersome computational overhead, hampered by the tradition of employing a solitary quantum processor to handle large-volume data. As such, to better exert the superiority of VQAs, it is of great significance to improve their runtime efficiency. Here we devise an efficient distributed optimization scheme, called QUDIO, to address this issue. Specifically, in QUDIO, a classical central server partitions the learning problem into multiple subproblems and allocate them to multiple local nodes where each of them consists of a quantum processor and a classical optimizer. During the training procedure, all local nodes proceed parallel optimization and the classical server synchronizes optimization information among local nodes timely. In doing so, we prove a sublinear convergence rate of QUDIO in terms of the number of global iteration under the ideal scenario, while the system imperfection may incur divergent optimization. Numerical results on standard benchmarks demonstrate that QUDIO can surprisingly achieve a superlinear runtime speedup with respect to the number of local nodes. Our proposal can be readily mixed with other advanced VQAs-based techniques to narrow the gap between the state of the art and applications with quantum advantage.

【17】 All unconstrained strongly convex problems are weakly simplicial 标题：所有无约束强凸问题都是弱单纯的

作者：Yusuke Mizota,Naoki Hamada,Shunsuke Ichiki 备注：19 pages, 3 figures. arXiv admin note: text overlap with arXiv:1912.09328 链接：https://arxiv.org/abs/2106.12704 摘要：一个多目标优化问题是$C^r$弱单形的，如果存在一个从单纯形到帕累托集/前沿的$C^r$满射，使得每个子样本的图像是子问题的帕累托集/前沿，其中$0\leq r\leq\infty$。这个性质有助于计算整个Pareto集和Pareto前沿的参数曲面近似。众所周知，对于$1\leq r\leq\infty$，所有无约束强凸$C^r$问题都是$C^{r-1}$弱单纯形。本文证明了所有无约束强凸问题都是弱单形的。该定理的有效性在稀疏建模中得到了证明：我们将弹性网络转化为一个不可微的多目标强凸问题，并通过以下方法逼近其Pareto集（具有不同超参数的所有训练模型集）和Pareto前沿（训练模型的性能指标集）使用B′ezier单纯形拟合方法，加速超参数搜索。摘要：A multi-objective optimization problem is $C^r$ weakly simplicial if there exists a $C^r$ surjection from a simplex onto the Pareto set/front such that the image of each subsimplex is the Pareto set/front of a subproblem, where $0\leq r\leq \infty$. This property is helpful to compute a parametric-surface approximation of the entire Pareto set and Pareto front. It is known that all unconstrained strongly convex $C^r$ problems are $C^{r-1}$ weakly simplicial for $1\leq r \leq \infty$. In this paper, we show that all unconstrained strongly convex problems are $C^0$ weakly simplicial. The usefulness of this theorem is demonstrated in a sparse modeling application: we reformulate the elastic net as a non-differentiable multi-objective strongly convex problem and approximate its Pareto set (the set of all trained models with different hyper-parameters) and Pareto front (the set of performance metrics of the trained models) by using a B\'ezier simplex fitting method, which accelerates hyper-parameter search.

【18】 Multi-objective Asynchronous Successive Halving 标题：多目标异步逐次减半

作者：Robin Schmucker,Michele Donini,Muhammad Bilal Zafar,David Salinas,Cédric Archambeau 机构：Machine Learning Department, Carnegie Mellon University, Amazon, Berlin, Germany 链接：https://arxiv.org/abs/2106.12639 摘要：超参数优化（Hyperparameter optimization，HPO）被越来越多地用于自动调整机器学习模型的预测性能（如精度）。然而，在大量的实际应用中，准确度只是多个性能标准中的一个，常常是相互冲突的，因此必须采用多目标（MO）的观点。虽然有关MO优化的文献比较丰富，但是很少有人关注HPO。在本文中，我们提出了算法，扩展异步连续减半（ASHA）到MO设置。考虑到多种评估指标，我们评估了这些方法在三个实际任务中的性能：（i）神经结构搜索，（ii）算法公平性和（iii）语言模型优化。我们的实证分析表明，moasha能够在规模上执行mohpo。此外，我们观察到，考虑到候选人选择的整个帕累托阵面在挂钟时间方面始终优于基于MO尺度化的多保真度HPO。我们的算法（开源）为该领域的未来研究建立了新的基线。摘要：Hyperparameter optimization (HPO) is increasingly used to automatically tune the predictive performance (e.g., accuracy) of machine learning models. However, in a plethora of real-world applications, accuracy is only one of the multiple -- often conflicting -- performance criteria, necessitating the adoption of a multi-objective (MO) perspective. While the literature on MO optimization is rich, few prior studies have focused on HPO. In this paper, we propose algorithms that extend asynchronous successive halving (ASHA) to the MO setting. Considering multiple evaluation metrics, we assess the performance of these methods on three real world tasks: (i) Neural architecture search, (ii) algorithmic fairness and (iii) language model optimization. Our empirical analysis shows that MO ASHA enables to perform MO HPO at scale. Further, we observe that that taking the entire Pareto front into account for candidate selection consistently outperforms multi-fidelity HPO based on MO scalarization in terms of wall-clock time. Our algorithms (to be open-sourced) establish new baselines for future research in the area.

【19】 Rank 2r iterative least squares: efficient recovery of ill-conditioned low rank matrices from few entries标题：秩2R迭代最小二乘：从少数几个条目有效恢复病态低秩矩阵

作者：Jonathan Bauch,Boaz Nadler,Pini Zilber 链接：https://arxiv.org/abs/2002.01849 摘要：我们提出了一种新的，简单的，计算效率高的低秩矩阵完成迭代方法。我们的方法受到一类因子分解型迭代算法的启发，但在问题的求解方式上与它们有很大的不同。准确地说，给定一个目标秩r$，我们不需要对秩r$矩阵的流形进行优化，而是允许我们的临时估计矩阵具有特定的超参数化秩2r$结构。我们的算法，表示为R2RILS的秩$2r$迭代最小二乘法，具有较低的内存需求，并且在每次迭代中它解决了一个计算成本较低的稀疏最小二乘问题。我们通过对秩1矩阵的简化情况的理论分析来激励我们的算法。根据经验，R2RILS能够从很少的观测值中恢复病态的低秩矩阵——接近信息极限，并且对加性噪声是稳定的。摘要：We present a new, simple and computationally efficient iterative method for low rank matrix completion. Our method is inspired by the class of factorization-type iterative algorithms, but substantially differs from them in the way the problem is cast. Precisely, given a target rank $r$, instead of optimizing on the manifold of rank $r$ matrices, we allow our interim estimated matrix to have a specific over-parametrized rank $2r$ structure. Our algorithm, denoted R2RILS for rank $2r$ iterative least squares, has low memory requirements, and at each iteration it solves a computationally cheap sparse least-squares problem. We motivate our algorithm by its theoretical analysis for the simplified case of a rank-1 matrix. Empirically, R2RILS is able to recover ill conditioned low rank matrices from very few observations -- near the information limit, and it is stable to additive noise.

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-06-25，如有侵权请联系 cloudcommunity@tencent.com 删除

linux