机器学习学术速递[6.24]

公众号-arXiv每日学术速递

发布于 2021-07-02 18:19:01

9430

发布于 2021-07-02 18:19:01

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、藏、发帖等功能！点击阅读原文即可访问

cs.LG 方向，今日共计86篇

Graph相关(图学习|图神经网络|图优化等)(6篇)

【1】 From Canonical Correlation Analysis to Self-supervised Graph Neural Networks 标题：从典型相关分析到自监督图神经网络

作者：Hengrui Zhang,Qitian Wu,Junchi Yan,David Wipf,Philip S. Yu 机构：University of Illinois at Chicago, Shanghai Jiao Tong University, AWS Shanghai AI Lab 链接：https://arxiv.org/abs/2106.12484 摘要：我们介绍了一个概念简单但有效的模型，用于图形数据的自监督表示学习。它遵循前面的方法，通过数据扩充生成输入图的两个视图。然而，与注重实例级区分的对比方法不同，我们在经典典型相关分析的启发下优化了一个创新的特征级目标。与其他工作相比，我们的方法不需要参数化的互信息估计，额外的投影，不对称的结构，最重要的是，负样本可能是昂贵的。我们证明了新的目标本质上是1）通过学习不变表示来丢弃增广变量信息，2）通过解相关不同维度的特征来防止退化解。我们的理论分析进一步提供了对新目标的理解，新目标可以等价地看作是自我监督环境下信息瓶颈原理的一个实例。尽管简单，我们的方法在七个公共图数据集上仍有竞争力。摘要：We introduce a conceptually simple yet effective model for self-supervised representation learning with graph data. It follows the previous methods that generate two views of an input graph through data augmentation. However, unlike contrastive methods that focus on instance-level discrimination, we optimize an innovative feature-level objective inspired by classical Canonical Correlation Analysis. Compared with other works, our approach requires none of the parameterized mutual information estimator, additional projector, asymmetric structures, and most importantly, negative samples which can be costly. We show that the new objective essentially 1) aims at discarding augmentation-variant information by learning invariant representations, and 2) can prevent degenerated solutions by decorrelating features in different dimensions. Our theoretical analysis further provides an understanding for the new objective which can be equivalently seen as an instantiation of the Information Bottleneck Principle under the self-supervised setting. Despite its simplicity, our method performs competitively on seven public graph datasets.

【2】 GraphConfRec: A Graph Neural Network-Based Conference Recommender System 标题：GraphConfRec：一个基于图神经网络的会议推荐系统

作者：Andreea Iana,Heiko Paulheim 机构：Data and Web Science Group, University of Mannheim, Mannheim, Germany 备注：Accepted at the Joint Conference on Digital Libraries (JCDL 2021) 链接：https://arxiv.org/abs/2106.12340 摘要：在当今的学术出版模式中，特别是在计算机科学领域，会议通常是发布各自领域最新同行评议成果的主要平台。然而，考虑到现有的会议太多，选择一个合适的学术场所发表自己的研究成果可能是一项具有挑战性的任务，特别是对于那些在学术生涯开始时，或对于那些寻求在其通常领域之外发表论文的人来说。在本文中，我们提出了一个结合SciGraph和图神经网络的会议推荐系统GraphConfRec，它不仅可以根据标题和摘要，还可以根据合著者和引用关系来推断建议。GraphConfRec实现了recall@10 一个高达0.580的地图和一个高达0.336的基于图形注意网络的推荐模型。一项有25名受试者的用户研究支持积极的结果。摘要：In today's academic publishing model, especially in Computer Science, conferences commonly constitute the main platforms for releasing the latest peer-reviewed advancements in their respective fields. However, choosing a suitable academic venue for publishing one's research can represent a challenging task considering the plethora of available conferences, particularly for those at the start of their academic careers, or for those seeking to publish outside of their usual domain. In this paper, we propose GraphConfRec, a conference recommender system which combines SciGraph and graph neural networks, to infer suggestions based not only on title and abstract, but also on co-authorship and citation relationships. GraphConfRec achieves a recall@10 of up to 0.580 and a MAP of up to 0.336 with a graph attention network-based recommendation model. A user study with 25 subjects supports the positive results.

【3】 MG-DVD: A Real-time Framework for Malware Variant Detection Based on Dynamic Heterogeneous Graph Learning 标题：MG-DVD：一种基于动态异构图学习的恶意软件变体实时检测框架

作者：Chen Liu,Bo Li,Jun Zhao,Ming Su,Xu-Dong Liu 机构：School of Computer Science and Engineering, Beihang University, Beijing, China, Beijing Advanced Innovation Center for Big Data and Brain Computing, Beihang University, China 备注：8 pages, 7 figures, IJCAI-21(the 30th International Joint Conference on Artificial Intelligence) 链接：https://arxiv.org/abs/2106.12288 摘要：实时检测新出现的恶意软件变种对于降低网络风险和主动阻止入侵至关重要。本文提出了一种基于动态异构图学习的恶意软件检测框架MG-DVD，用于实时检测恶意软件变种。特别地，MG-DVD首先将恶意软件变体的细粒度执行事件流建模为动态异构图，并研究恶意软件对象之间的真实元图，从而有效地刻画恶意软件及其变体之间更具区分性的恶意演化模式。然后，MG-DVD提出了两种基于动态游走的异构图学习方法来学习更全面的恶意软件变体表示，这大大降低了整个图的再训练成本。因此，MG-DVD具备实时检测恶意软件变体的能力，并通过引入有意义的元图来提供更好的可解释性。在大规模样本上的综合实验证明，我们提出的MG-DVD在检测恶意软件变体的有效性和效率方面优于现有的方法。摘要：Detecting the newly emerging malware variants in real time is crucial for mitigating cyber risks and proactively blocking intrusions. In this paper, we propose MG-DVD, a novel detection framework based on dynamic heterogeneous graph learning, to detect malware variants in real time. Particularly, MG-DVD first models the fine-grained execution event streams of malware variants into dynamic heterogeneous graphs and investigates real-world meta-graphs between malware objects, which can effectively characterize more discriminative malicious evolutionary patterns between malware and their variants. Then, MG-DVD presents two dynamic walk-based heterogeneous graph learning methods to learn more comprehensive representations of malware variants, which significantly reduces the cost of the entire graph retraining. As a result, MG-DVD is equipped with the ability to detect malware variants in real time, and it presents better interpretability by introducing meaningful meta-graphs. Comprehensive experiments on large-scale samples prove that our proposed MG-DVD outperforms state-of-the-art methods in detecting malware variants in terms of effectiveness and efficiency.

【4】 NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs 标题：NodePiess：大型知识图的组合和参数高效表示

作者：Mikhail Galkin,Jiapeng Wu,Etienne Denis,William L. Hamilton 机构：Mila, McGill University, Montreal, Canada 链接：https://arxiv.org/abs/2106.12144 摘要：传统的知识图表示学习算法将每个实体映射到一个唯一的嵌入向量。这种浅层查找导致用于存储嵌入矩阵的内存消耗的线性增长，并且在处理真实世界的KG时产生高计算成本。与NLP中常用的子词标记化方法相比较，我们探索了具有可能的次线性内存需求的更具参数效率的节点嵌入策略。为此，我们提出了NodePiece，一种基于锚的方法来学习固定大小的实体词汇表。在NodePiece中，子词/子实体单元的词汇表是由具有已知关系类型的图中的锚节点构造的。给定这样一个固定大小的词汇表，可以引导任何实体的编码和嵌入，包括在训练期间看不到的实体。实验表明，NodePiece在节点分类、链路预测和关系预测任务中表现出很强的竞争力，同时在图中保留不到10%的显式节点作为锚，并且参数通常减少10倍。摘要：Conventional representation learning algorithms for knowledge graphs (KG) map each entity to a unique embedding vector. Such a shallow lookup results in a linear growth of memory consumption for storing the embedding matrix and incurs high computational costs when working with real-world KGs. Drawing parallels with subword tokenization commonly used in NLP, we explore the landscape of more parameter-efficient node embedding strategies with possibly sublinear memory requirements. To this end, we propose NodePiece, an anchor-based approach to learn a fixed-size entity vocabulary. In NodePiece, a vocabulary of subword/sub-entity units is constructed from anchor nodes in a graph with known relation types. Given such a fixed-size vocabulary, it is possible to bootstrap an encoding and embedding for any entity, including those unseen during training. Experiments show that NodePiece performs competitively in node classification, link prediction, and relation prediction tasks while retaining less than 10% of explicit nodes in a graph as anchors and often having 10x fewer parameters.

【5】 ABCD: A Graph Framework to Convert Complex Sentences to a Covering Set of Simple Sentences 标题：ABCD：一个将复句转换为简单句覆盖集的图框架

作者：Yanjun Gao,Ting-hao,Huang,Rebecca J. Passonneau 机构：Pennsylvania State University 备注：To appear in the proceeding of 59th Annual Meeting of the Association for Computational Linguistics (ACL 2021) Main Conference 链接：https://arxiv.org/abs/2106.12027 摘要：原子分句是理解复句的基本语篇单位。复杂句中原子句的识别对于摘要、论据挖掘、语篇分析、语篇分析、问答等应用都具有重要意义。以前的工作主要依赖于依赖于解析的基于规则的方法。我们提出了一个新的任务，将每一个复句分解成由源中的时态从句派生出的简单句，并提出了一个新的问题表述作为一个图编辑任务。我们的神经模型学习接受、破坏、复制或删除结合词邻接和语法依赖的图形元素。整个处理流程包括图形构造、图形编辑和从输出图形生成句子的模块。我们介绍了一个新的用于训练和评估复句分解的数据集DeSSE和MinWikiSplit的子集MinWiki。ABCD作为MinWiki上的两个解析基线，性能相当。在具有更均匀的复杂句型平衡的DeSSE上，我们的模型实现了比编码器-解码器基线更高的原子句数量精度。结果包括详细的误差分析。摘要：Atomic clauses are fundamental text units for understanding complex sentences. Identifying the atomic sentences within complex sentences is important for applications such as summarization, argument mining, discourse analysis, discourse parsing, and question answering. Previous work mainly relies on rule-based methods dependent on parsing. We propose a new task to decompose each complex sentence into simple sentences derived from the tensed clauses in the source, and a novel problem formulation as a graph edit task. Our neural model learns to Accept, Break, Copy or Drop elements of a graph that combines word adjacency and grammatical dependencies. The full processing pipeline includes modules for graph construction, graph editing, and sentence generation from the output graph. We introduce DeSSE, a new dataset designed to train and evaluate complex sentence decomposition, and MinWiki, a subset of MinWikiSplit. ABCD achieves comparable performance as two parsing baselines on MinWiki. On DeSSE, which has a more even balance of complex sentence types, our model achieves higher accuracy on the number of atomic sentences than an encoder-decoder baseline. Results include a detailed error analysis.

【6】 Exploring the Representational Power of Graph Autoencoder 标题：图形自动编码器表现力的探索

作者：Maroun Haddad,Mohamed Bouguessa 机构：Department of Computer Science, University of Quebec at Montreal, Montreal, Quebec, Canada 链接：https://arxiv.org/abs/2106.12005 摘要：尽管表征学习在许多图形学习任务中取得了巨大的成功，但是对于这些嵌入所捕获的结构背后的理解却很少。例如，我们想知道拓扑特征，如三角形计数、节点度和其他中心性度量是否具体编码在嵌入中。此外，我们询问嵌入中这些结构的存在是否对下游任务（如聚类和分类）的更好性能是必要的。为了解决这些问题，我们对三类无监督图嵌入模型和七种不同的图自动编码器进行了广泛的实证研究。结果表明，在模型保持二阶近似的前提下，采用和聚集规则的图自动编码器的第一层具体地保留了度、局部聚类得分、介数中心性、特征向量中心性和三角形计数等五个拓扑特征。我们通过揭示上述模型嵌入中拓扑特征分布的层次结构来补充这些特征存在的进一步证据。我们还证明了具有这些性质的模型在某些下游任务上的性能优于其他模型，特别是当保留的特征与手头的任务相关时。最后，我们通过一个与社会影响预测相关的测试案例来评估我们研究结果的适用性。摘要：While representation learning has yielded a great success on many graph learning tasks, there is little understanding behind the structures that are being captured by these embeddings. For example, we wonder if the topological features, such as the Triangle Count, the Degree of the node and other centrality measures are concretely encoded in the embeddings. Furthermore, we ask if the presence of these structures in the embeddings is necessary for a better performance on the downstream tasks, such as clustering and classification. To address these questions, we conduct an extensive empirical study over three classes of unsupervised graph embedding models and seven different variants of Graph Autoencoders. Our results show that five topological features: the Degree, the Local Clustering Score, the Betweenness Centrality, the Eigenvector Centrality, and Triangle Count are concretely preserved in the first layer of the graph autoencoder that employs the SUM aggregation rule, under the condition that the model preserves the second-order proximity. We supplement further evidence for the presence of these features by revealing a hierarchy in the distribution of the topological features in the embeddings of the aforementioned model. We also show that a model with such properties can outperform other models on certain downstream tasks, especially when the preserved features are relevant to the task at hand. Finally, we evaluate the suitability of our findings through a test case study related to social influence prediction.

GAN|对抗|攻击|生成相关(3篇)

【1】 Teacher Model Fingerprinting Attacks Against Transfer Learning 标题：针对迁移学习的教师模型指纹攻击

作者：Yufei Chen,Chao Shen,Cong Wang,Yang Zhang 机构：Xi’an Jiaotong University ,City University of Hong Kong, CISPA Helmholtz Center for Information Security 链接：https://arxiv.org/abs/2106.12478 摘要：在实践中，迁移学习已成为解决训练数据匮乏的一种常用方法。它通过重用或微调通常公开的训练有素的教师模型的早期层来训练指定的学生模型。然而，除了效用的提高外，公共知识的转移也给模型的保密性带来了潜在的威胁，甚至进一步引发了其他的安全和隐私问题。本文首次对迁移学习环境下的教师模型暴露威胁进行了全面调查，旨在更深入地了解公共知识与模型机密性之间的紧张关系。为此，我们提出了一种教师模型指纹攻击来推断学生模型的来源，即从中转移的教师模型。具体来说，我们提出了一种新的基于优化的方法来仔细地生成查询，以探测学生模型来实现我们的攻击。与现有的模型逆向工程方法不同，我们提出的指纹识别方法既不依赖于细粒度的模型输出，如后验概率，也不依赖于模型结构或训练数据集的辅助信息。我们系统地评估了我们提议的攻击的有效性。实验结果表明，我们的攻击可以在较少的探测查询下准确地识别模型来源。此外，我们还证明了所提出的攻击可以作为一个垫脚石，以促进其他针对机器学习模型的攻击，如模型窃取。摘要：Transfer learning has become a common solution to address training data scarcity in practice. It trains a specified student model by reusing or fine-tuning early layers of a well-trained teacher model that is usually publicly available. However, besides utility improvement, the transferred public knowledge also brings potential threats to model confidentiality, and even further raises other security and privacy issues. In this paper, we present the first comprehensive investigation of the teacher model exposure threat in the transfer learning context, aiming to gain a deeper insight into the tension between public knowledge and model confidentiality. To this end, we propose a teacher model fingerprinting attack to infer the origin of a student model, i.e., the teacher model it transfers from. Specifically, we propose a novel optimization-based method to carefully generate queries to probe the student model to realize our attack. Unlike existing model reverse engineering approaches, our proposed fingerprinting method neither relies on fine-grained model outputs, e.g., posteriors, nor auxiliary information of the model architecture or training dataset. We systematically evaluate the effectiveness of our proposed attack. The empirical results demonstrate that our attack can accurately identify the model origin with few probing queries. Moreover, we show that the proposed attack can serve as a stepping stone to facilitating other attacks against machine learning models, such as model stealing.

【2】 Alias-Free Generative Adversarial Networks 标题：无别名生成性对抗性网络

作者：Tero Karras,Miika Aittala,Samuli Laine,Erik Härkönen,Janne Hellsten,Jaakko Lehtinen,Timo Aila 机构：Aalto University and NVIDIA, NVIDIA and Aalto University 链接：https://arxiv.org/abs/2106.12423 摘要：我们观察到，尽管他们的层次卷积性质，合成过程中的典型生成对手网络依赖于绝对像素坐标在一个不健康的方式。这表现为，例如，细节似乎被粘在图像坐标上，而不是被描绘对象的表面。我们追踪的根本原因是粗心的信号处理，造成混叠在发电机网络。将网络中的所有信号解释为连续的，我们导出了普遍适用的、小的体系结构更改，以保证不需要的信息不会泄漏到分层合成过程中。得到的网络与StyleGAN2的FID匹配，但在内部表示上有很大的不同，即使在亚像素尺度上，它们也完全等同于平移和旋转。我们的结果为更适合视频和动画的生成模型铺平了道路。摘要：We observe that despite their hierarchical convolutional nature, the synthesis process of typical generative adversarial networks depends on absolute pixel coordinates in an unhealthy manner. This manifests itself as, e.g., detail appearing to be glued to image coordinates instead of the surfaces of depicted objects. We trace the root cause to careless signal processing that causes aliasing in the generator network. Interpreting all signals in the network as continuous, we derive generally applicable, small architectural changes that guarantee that unwanted information cannot leak into the hierarchical synthesis process. The resulting networks match the FID of StyleGAN2 but differ dramatically in their internal representations, and they are fully equivariant to translation and rotation even at subpixel scales. Our results pave the way for generative models better suited for video and animation.

【3】 Fairness for Image Generation with Uncertain Sensitive Attributes 标题：具有不确定敏感属性的图像生成公平性研究

作者：Ajil Jalal,Sushrut Karmalkar,Jessica Hoffmann,Alexandros G. Dimakis,Eric Price 机构： The University of Texas at Austin 2Department of Computer Science, The University of Texas atAustin 链接：https://arxiv.org/abs/2106.12182 摘要：这项工作处理的公平性问题的背景下产生的程序，如图像超分辨率，这需要不同的定义，从标准的分类设置。此外，虽然传统的群体公平性定义通常是针对特定的受保护群体来定义的——掩盖了这些群体是人为的、带有历史和政治动机这一事实——但我们强调，没有基本的真相认同。例如，南亚人和东亚人应该被视为一个单独的群体还是单独的群体？我们应该把一个种族看作一个整体，还是按性别进一步划分？选择哪些群体是有效的，谁属于他们是一个不可能的两难选择，对亚洲人来说“公平”可能要求对南亚人来说“不公平”。这推动了定义的引入，允许算法对相关分组是不经意的。我们定义了群体公平的几个直观概念，并研究了它们的不相容性和取舍。我们证明了人口均等的自然扩展强烈地依赖于分组，并且{不可能}被遗忘地实现。另一方面，我们引入的新定义，条件比例表示，可以通过后验抽样实现。我们的实验验证了我们的理论结果，并使用最先进的生成模型实现了公平的图像重建。摘要：This work tackles the issue of fairness in the context of generative procedures, such as image super-resolution, which entail different definitions from the standard classification setting. Moreover, while traditional group fairness definitions are typically defined with respect to specified protected groups -- camouflaging the fact that these groupings are artificial and carry historical and political motivations -- we emphasize that there are no ground truth identities. For instance, should South and East Asians be viewed as a single group or separate groups? Should we consider one race as a whole or further split by gender? Choosing which groups are valid and who belongs in them is an impossible dilemma and being ``fair'' with respect to Asians may require being ``unfair'' with respect to South Asians. This motivates the introduction of definitions that allow algorithms to be \emph{oblivious} to the relevant groupings. We define several intuitive notions of group fairness and study their incompatibilities and trade-offs. We show that the natural extension of demographic parity is strongly dependent on the grouping, and \emph{impossible} to achieve obliviously. On the other hand, the conceptually new definition we introduce, Conditional Proportional Representation, can be achieved obliviously through Posterior Sampling. Our experiments validate our theoretical results and achieve fair image reconstruction using state-of-the-art generative models.

半/弱/无/有监督|不确定性|主动学习(6篇)

【1】 Learning Multimodal VAEs through Mutual Supervision 标题：在相互监督中学习多模式虚拟企业

作者：Tom Joy,Yuge Shi,Philip H. S. Torr,Tom Rainforth,Sebastian M. Schmon,N. Siddharth 机构：University of Oxford, Philip H.S. Torr, Improbable, University of Edinburgh 链接：https://arxiv.org/abs/2106.12570 摘要：多模态VAE寻求对异构数据（例如视觉、语言）的联合分布进行建模，同时也捕获跨这些模式的共享表示。先前的工作通常是通过直接在识别模型中通过显式的产品、混合物或其他类似的因子来调和特质表征，从而组合来自模式的信息。在这里，我们介绍了一种新的选择，模因，它通过重新利用半监督的vae来避免这种显式的组合，通过相互监督隐式地组合模式之间的信息。这种表述自然允许从部分观察到的数据中学习，其中一些模式可能完全缺失——大多数现有方法要么无法处理，要么在有限的范围内处理。在MNIST-SVHN（图像-图像）和CUB（图像-文本）数据集上，我们证明了模因在部分和完全观测方案中都优于标准度量的基线。我们还将通过相互监督学习到的表征的质量与标准方法进行了对比，并观察了其捕捉数据之间相关性的能力的有趣趋势。摘要：Multimodal VAEs seek to model the joint distribution over heterogeneous data (e.g.\ vision, language), whilst also capturing a shared representation across such modalities. Prior work has typically combined information from the modalities by reconciling idiosyncratic representations directly in the recognition model through explicit products, mixtures, or other such factorisations. Here we introduce a novel alternative, the MEME, that avoids such explicit combinations by repurposing semi-supervised VAEs to combine information between modalities implicitly through mutual supervision. This formulation naturally allows learning from partially-observed data where some modalities can be entirely missing -- something that most existing approaches either cannot handle, or do so to a limited extent. We demonstrate that MEME outperforms baselines on standard metrics across both partial and complete observation schemes on the MNIST-SVHN (image--image) and CUB (image--text) datasets. We also contrast the quality of the representations learnt by mutual supervision against standard approaches and observe interesting trends in its ability to capture relatedness between data.

【2】 Generative Self-training for Cross-domain Unsupervised Tagged-to-Cine MRI Synthesis 标题：跨域无监督Tag-to-Cine MRI合成的生成性自我训练

作者：Xiaofeng Liu,Fangxu Xing,Maureen Stone,Jiachen Zhuo,Reese Timothy,Jerry L. Prince,Georges El Fakhri,Jonghye Woo 机构： Gordon Center for Medical Imaging, Department of Radiology, Massachusetts, Dept. of Neural and Pain Sciences, University of Maryland School of Dentistry, Baltimore, MD, USA, Athinoula A. Martinos Center for Biomedical Imaging, Dept. of Radiology 备注：MICCAI 2021 (early accept <13%) 链接：https://arxiv.org/abs/2106.12499 摘要：基于自训练的无监督域自适应（UDA）在将源域中训练好的深度学习模型应用于未标记的目标域时，显示出很大的潜力来解决域转移问题。然而，尽管自训练UDA已经证明了其在区分性任务（如分类和分割）上的有效性，但是通过基于softmax离散直方图的可靠伪标记选择，生成性任务（如图像合成）的自训练UDA还没有得到充分的研究。在这项工作中，我们提出了一个新的具有连续值预测和回归目标的生成性自训练（GST）UDA框架，用于跨域图像合成。具体地说，我们提出用不确定性掩模过滤伪标签，并用实际的变分Bayes学习量化生成图像的预测置信度。测试时间的快速自适应是通过一个基于轮的交替优化方案来实现的。我们在标记到电影磁共振成像（MRI）合成问题上验证了我们的框架，其中源域和目标域中的数据集是从不同的扫描仪或中心获得的。广泛的验证进行了验证，以验证我们的框架对流行的对抗性训练方法。结果表明，与对抗性训练方法相比，我们的GST在新靶区标记了受试者的MRI，大大提高了合成质量。摘要：Self-training based unsupervised domain adaptation (UDA) has shown great potential to address the problem of domain shift, when applying a trained deep learning model in a source domain to unlabeled target domains. However, while the self-training UDA has demonstrated its effectiveness on discriminative tasks, such as classification and segmentation, via the reliable pseudo-label selection based on the softmax discrete histogram, the self-training UDA for generative tasks, such as image synthesis, is not fully investigated. In this work, we propose a novel generative self-training (GST) UDA framework with continuous value prediction and regression objective for cross-domain image synthesis. Specifically, we propose to filter the pseudo-label with an uncertainty mask, and quantify the predictive confidence of generated images with practical variational Bayes learning. The fast test-time adaptation is achieved by a round-based alternative optimization scheme. We validated our framework on the tagged-to-cine magnetic resonance imaging (MRI) synthesis problem, where datasets in the source and target domains were acquired from different scanners or centers. Extensive validations were carried out to verify our framework against popular adversarial training UDA methods. Results show that our GST, with tagged MRI of test subjects in new target domains, improved the synthesis quality by a large margin, compared with the adversarial training UDA methods.

【3】 Unsupervised Speech Enhancement using Dynamical Variational Auto-Encoders 标题：基于动态变分自动编码器的无监督语音增强

作者：Xiaoyu Bie,Simon Leglaive,Xavier Alameda-Pineda,Laurent Girin 链接：https://arxiv.org/abs/2106.12271 摘要：动态变分自动编码器（dynamicvariationauto-encoders，DVAEs）是一类具有潜变量的深生成模型，用于时间序列数据建模。DVAEs可以看作是变分自编码器（VAE）的扩展，包括对数据序列中连续观测向量和/或潜在向量之间的时间依赖性的建模。以往的研究表明，DVAEs在语音信号（谱图）建模中有着广泛的应用前景，其性能优于VAE。独立地，VAE已经成功地应用于噪声中的语音增强，在无监督的噪声不可知设置中，不需要使用干净和有噪声语音样本的并行数据集进行训练，而只需要干净的语音信号。在本文中，我们将这些工作扩展到基于DVAE的单通道无监督语音增强，从而开发了语音信号的无监督表示学习和动力学建模。我们提出了一种基于DVAEs最一般形式的无监督语音增强算法，并将其应用于三种特定的DVAE模型，以说明该框架的通用性。更准确地说，我们将基于DVAE的语音先验知识与基于非负矩阵分解的噪声模型相结合，提出了一种变分期望最大化（VEM）算法来进行语音增强。实验结果表明，基于DVAEs的语音增强方法优于VAE算法和有监督的语音增强基线。摘要：Dynamical variational auto-encoders (DVAEs) are a class of deep generative models with latent variables, dedicated to time series data modeling. DVAEs can be considered as extensions of the variational autoencoder (VAE) that include the modeling of temporal dependencies between successive observed and/or latent vectors in data sequences. Previous work has shown the interest of DVAEs and their better performance over the VAE for speech signals (spectrogram) modeling. Independently, the VAE has been successfully applied to speech enhancement in noise, in an unsupervised noise-agnostic set-up that does not require the use of a parallel dataset of clean and noisy speech samples for training, but only requires clean speech signals. In this paper, we extend those works to DVAE-based single-channel unsupervised speech enhancement, hence exploiting both speech signals unsupervised representation learning and dynamics modeling. We propose an unsupervised speech enhancement algorithm based on the most general form of DVAEs, that we then adapt to three specific DVAE models to illustrate the versatility of the framework. More precisely, we combine DVAE-based speech priors with a noise model based on nonnegative matrix factorization, and we derive a variational expectation-maximization (VEM) algorithm to perform speech enhancement. Experimental results show that the proposed approach based on DVAEs outperforms its VAE counterpart and a supervised speech enhancement baseline.

【4】 Uncertainty-Aware Model-Based Reinforcement Learning with Application to Autonomous Driving 标题：基于不确定性感知模型的强化学习及其在自主驾驶中的应用

作者：Jingda Wu,Zhiyu Huang,Chen Lv 机构：Member, IEEE, A 链接：https://arxiv.org/abs/2106.12194 摘要：为了进一步提高强化学习（RL）的学习效率和性能，本文提出了一种新的基于不确定性感知模型的RL（UA-MBRL）框架，并在各种任务场景下的自主驾驶中进行了实现和验证。首先，建立了具有不确定性评估能力的动作条件集成模型作为虚拟环境模型。然后，基于自适应截断方法，提出了一种新的基于不确定性感知模型的RL框架，提供了agent与环境模型之间的虚拟交互，提高了RL的训练效率和性能。然后将所提出的算法应用于端到端的自主车辆控制任务中，并与各种驾驶场景下的最新方法进行了验证和比较。验证结果表明，所提出的UA-MBRL方法在学习效率和性能上优于现有的基于模型和无模型的RL方法。实验结果还表明，在各种自主驾驶场景下，该方法具有良好的自适应性和鲁棒性。摘要：To further improve the learning efficiency and performance of reinforcement learning (RL), in this paper we propose a novel uncertainty-aware model-based RL (UA-MBRL) framework, and then implement and validate it in autonomous driving under various task scenarios. First, an action-conditioned ensemble model with the ability of uncertainty assessment is established as the virtual environment model. Then, a novel uncertainty-aware model-based RL framework is developed based on the adaptive truncation approach, providing virtual interactions between the agent and environment model, and improving RL's training efficiency and performance. The developed algorithms are then implemented in end-to-end autonomous vehicle control tasks, validated and compared with state-of-the-art methods under various driving scenarios. The validation results suggest that the proposed UA-MBRL method surpasses the existing model-based and model-free RL approaches, in terms of learning efficiency and achieved performance. The results also demonstrate the good ability of the proposed method with respect to the adaptiveness and robustness, under various autonomous driving scenarios.

【5】 A Simple Baseline for Batch Active Learning with Stochastic Acquisition Functions 标题：具有随机获取函数的批处理主动学习的一种简单基线

作者：Andreas Kirsch,Sebastian Farquhar,Yarin Gal 机构： Department of Computer Science, UniversityofOxford 链接：https://arxiv.org/abs/2106.12059 摘要：在主动学习中，新标签通常是成批获得的。然而，常见的采集函数一次只能用于一个样本采集轮，当它们的分数被天真地用于批量采集时，它们会导致批次缺乏多样性，从而降低性能。另一方面，最先进的批量采集函数的计算成本很高。在本文中，我们提出了一类新的随机获取函数，通过观察一个样本的获取分数随额外样本的获取而变化，并对额外批次样本的这种差异进行建模，将一个样本的获取函数扩展到批次设置。我们只需根据采集分数，使用Gibbs分布从池集中采样，就可以获得新的样本。我们的采集函数在计算和执行其他批量采集函数时都要便宜得多。摘要：In active learning, new labels are commonly acquired in batches. However, common acquisition functions are only meant for one-sample acquisition rounds at a time, and when their scores are used naively for batch acquisition, they result in batches lacking diversity, which deteriorates performance. On the other hand, state-of-the-art batch acquisition functions are costly to compute. In this paper, we present a novel class of stochastic acquisition functions that extend one-sample acquisition functions to the batch setting by observing how one-sample acquisition scores change as additional samples are acquired and modelling this difference for additional batch samples. We simply acquire new samples by sampling from the pool set using a Gibbs distribution based on the acquisition scores. Our acquisition functions are both vastly cheaper to compute and out-perform other batch acquisition functions.

【6】 Finding simplicity: unsupervised discovery of features, patterns, and order parameters via shift-invariant variational autoencoders 标题：发现简单性：通过平移不变变自动编码器无监督地发现特征、模式和顺序参数

作者：Maxim Ziatdinov,Chun Yin Wong,Sergei V. Kalinin 机构： The Department of Energy will provide public access to these results of federally sponsored research in accordance with the DOE Public Access Plan (http 链接：https://arxiv.org/abs/2106.12472 摘要：扫描隧道显微镜和透射电子显微镜（STM和STEM）的最新进展使得能够常规生成包含材料结构和功能信息的大量成像数据。实验数据集包含长程现象的特征，如物理序参量场、STEM中的极化和应变梯度，或STM中的驻波和载流子介导的交换作用，由于漂移和/或错误倾斜效应，所有这些叠加到扫描系统上的畸变和对比度的逐渐变化。相应地，虽然人眼可以容易地识别图像中的某些模式，例如晶格周期性、重复结构元素或微结构，但是它们的自动提取和分类是非常不平凡的，并且缺少完成此类分析的通用路径。我们提出，在STM和TEM图像中观察到的模式的最显著的元素是相似性和（几乎）周期性，这些行为直接源于基本原子结构的简约性，叠加在反映有序参数分布的渐变上。然而，由于可变性和缺乏理想的离散平移对称性，通过全局傅立叶方法发现这些元素是非常重要的。为了解决这个问题，我们开发了移位不变变分自动编码器（shift-VAE），它允许分离图像中的特征重复特征、它们的变化以及图像空间随机采样不可避免的移位。Shift-VAEs平衡了目标位置的不确定性和形状重建的不确定性。该方法被用于模型一维数据，并进一步扩展到合成和实验STM和STEM二维数据。摘要：Recent advances in scanning tunneling and transmission electron microscopies (STM and STEM) have allowed routine generation of large volumes of imaging data containing information on the structure and functionality of materials. The experimental data sets contain signatures of long-range phenomena such as physical order parameter fields, polarization and strain gradients in STEM, or standing electronic waves and carrier-mediated exchange interactions in STM, all superimposed onto scanning system distortions and gradual changes of contrast due to drift and/or mis-tilt effects. Correspondingly, while the human eye can readily identify certain patterns in the images such as lattice periodicities, repeating structural elements, or microstructures, their automatic extraction and classification are highly non-trivial and universal pathways to accomplish such analyses are absent. We pose that the most distinctive elements of the patterns observed in STM and (S)TEM images are similarity and (almost-) periodicity, behaviors stemming directly from the parsimony of elementary atomic structures, superimposed on the gradual changes reflective of order parameter distributions. However, the discovery of these elements via global Fourier methods is non-trivial due to variability and lack of ideal discrete translation symmetry. To address this problem, we develop shift-invariant variational autoencoders (shift-VAE) that allow disentangling characteristic repeating features in the images, their variations, and shifts inevitable for random sampling of image space. Shift-VAEs balance the uncertainty in the position of the object of interest with the uncertainty in shape reconstruction. This approach is illustrated for model 1D data, and further extended to synthetic and experimental STM and STEM 2D data.

迁移|Zero/Few/One-Shot|自适应(2篇)

【1】 Classifying Textual Data with Pre-trained Vision Models through Transfer Learning and Data Transformations 标题：基于迁移学习和数据转换的预训练视觉模型对文本数据的分类

作者：Charaf Eddine Benarab 备注：Paper contains: 5 pages, 6 figures, 1 table 链接：https://arxiv.org/abs/2106.12479 摘要：知识是人类通过经验获得的，我们在同一时间完成不同任务所能获得的知识或技能水平之间没有界限。当谈到神经网络时，情况并非如此，该领域的重大突破都是针对特定任务和领域的。视觉和语言以不同的方式处理，使用不同的方法和不同的数据集。在这项工作中，我们建议使用在ImageNet上训练的基准视觉模型所获得的知识来帮助更小的体系结构学习文本分类。将IMDB数据集中包含的文本数据转换为灰度图像后。对不同领域和迁移学习方法进行了分析。尽管不同的数据集带来了挑战，但还是取得了有希望的结果。这项工作的主要贡献是一种新颖的方法，它将语言和视觉上的大型预训练模型连接起来，在不同的子领域中从原始任务中获得最先进的结果。不需要高计算能力的资源。具体来说，情感分析是在视觉模型和语言模型之间传递知识后实现的。将BERT嵌入变换为灰度图像，然后将这些图像作为预训练视觉模型的训练样本，如VGG16和ResNet索引项：自然语言、视觉、BERT、迁移学习、CNN、域自适应。摘要：Knowledge is acquired by humans through experience, and no boundary is set between the kinds of knowledge or skill levels we can achieve on different tasks at the same time. When it comes to Neural Networks, that is not the case, the major breakthroughs in the field are extremely task and domain specific. Vision and language are dealt with in separate manners, using separate methods and different datasets. In this work, we propose to use knowledge acquired by benchmark Vision Models which are trained on ImageNet to help a much smaller architecture learn to classify text. After transforming the textual data contained in the IMDB dataset to gray scale images. An analysis of different domains and the Transfer Learning method is carried out. Despite the challenge posed by the very different datasets, promising results are achieved. The main contribution of this work is a novel approach which links large pretrained models on both language and vision to achieve state-of-the-art results in different sub-fields from the original task. Without needing high compute capacity resources. Specifically, Sentiment Analysis is achieved after transferring knowledge between vision and language models. BERT embeddings are transformed into grayscale images, these images are then used as training examples for pretrained vision models such as VGG16 and ResNet Index Terms: Natural language, Vision, BERT, Transfer Learning, CNN, Domain Adaptation.

【2】 Secure Domain Adaptation with Multiple Sources 标题：多源安全域适配

作者：Serban Stan,Mohammad Rostami 机构：Department of Computer Science, University of Southern California 链接：https://arxiv.org/abs/2106.12124 摘要：多源无监督领域自适应（Multi-source unsupervised domain adaption，MUDA）是近年来发展起来的一种学习框架，其目标是通过使用带注释的数据从多源领域转移知识来解决目标领域中标记数据稀缺的问题。由于源数据是分布式的，因此源域数据的隐私性自然会受到关注。我们从嵌入空间中的域对齐的思想中获益，以解决MUDA的隐私问题。我们的方法基于通过内部学习分布间接地对齐源分布和目标分布，而无需在域之间通信数据样本。我们从理论上证明了我们的方法，并进行了大量的实验来证明我们的方法是有效的，并且与现有的方法进行了比较。摘要：Multi-source unsupervised domain adaptation (MUDA) is a recently explored learning framework, where the goal is to address the challenge of labeled data scarcity in a target domain via transferring knowledge from multiple source domains with annotated data. Since the source data is distributed, the privacy of source domains' data can be a natural concern. We benefit from the idea of domain alignment in an embedding space to address the privacy concern for MUDA. Our method is based on aligning the sources and target distributions indirectly via internally learned distributions, without communicating data samples between domains. We justify our approach theoretically and perform extensive experiments to demonstrate that our method is effective and compares favorably against existing methods.

医学相关(1篇)

【1】 Adapting Off-the-Shelf Source Segmenter for Target Medical Image Segmentation 标题：自适应现成的源分割器用于目标医学图像分割

作者：Xiaofeng Liu,Fangxu Xing,Chao Yang,Georges El Fakhri,Jonghye Woo 机构： Gordon Center for Medical Imaging, Department of Radiology, Massachusetts, General Hospital and Harvard Medical School, Boston, MA, Facebook Artificial Intelligence, Boston, MA 备注：To appear in MICCAI 2021 链接：https://arxiv.org/abs/2106.12497 摘要：无监督域自适应（Unsupervised domain adaption，UDA）的目的是将从标记的源域学习到的知识转移到一个未标记的、不可见的目标域，该目标域通常基于两个域的数据进行训练。然而，由于数据存储或隐私问题，在适配阶段对源域数据的访问通常是有限的。为了缓解这一问题，在本文中，我们针对无源UDA进行了分割，并提出了一种在源域中预先训练好的现成的分割模型来适应目标域，该模型采用了一种自适应的批量归一化统计自适应框架。具体地说，域特定的低阶批处理统计量，即均值和方差，通过指数动量衰减方案逐渐适应，而域共享的高阶批处理统计量，即缩放和移动参数的一致性，通过我们的优化目标得到了显式的加强。首先自适应地测量每个信道的可转移性，从中平衡每个信道的贡献。此外，提出的无源UDA框架与无监督学习方法（如自熵最小化）是正交的，因此可以简单地添加到我们的框架之上。在BraTS 2018数据库上的大量实验表明，我们的无源UDA框架在跨子类型UDA分割任务中优于现有的源松弛UDA方法，并且在跨模态UDA分割任务中得到了与源数据监督UDA方法相当的结果。摘要：Unsupervised domain adaptation (UDA) aims to transfer knowledge learned from a labeled source domain to an unlabeled and unseen target domain, which is usually trained on data from both domains. Access to the source domain data at the adaptation stage, however, is often limited, due to data storage or privacy issues. To alleviate this, in this work, we target source free UDA for segmentation, and propose to adapt an ``off-the-shelf" segmentation model pre-trained in the source domain to the target domain, with an adaptive batch-wise normalization statistics adaptation framework. Specifically, the domain-specific low-order batch statistics, i.e., mean and variance, are gradually adapted with an exponential momentum decay scheme, while the consistency of domain shareable high-order batch statistics, i.e., scaling and shifting parameters, is explicitly enforced by our optimization objective. The transferability of each channel is adaptively measured first from which to balance the contribution of each channel. Moreover, the proposed source free UDA framework is orthogonal to unsupervised learning methods, e.g., self-entropy minimization, which can thus be simply added on top of our framework. Extensive experiments on the BraTS 2018 database show that our source free UDA framework outperformed existing source-relaxed UDA methods for the cross-subtype UDA segmentation task and yielded comparable results for the cross-modality UDA segmentation task, compared with a supervised UDA methods with the source data.

蒸馏|知识提取(1篇)

【1】 Co-advise: Cross Inductive Bias Distillation 标题：联合建议：交叉感应偏压蒸馏

作者：Sucheng Ren,Zhengqi Gao,Tianyu Hua,Zihui Xue,Yonglong Tian,Shengfeng He,Hang Zhao 机构：South China University of Technology, MIT, University of Texas at Austin, Tsinghua University, Shanghai Qi Zhi Institute 链接：https://arxiv.org/abs/2106.12378 摘要：Transformer最近被改编自社区的自然语言处理作为一个有前途的替代基于卷积神经网络的视觉学习任务。然而，如果训练数据量不足（例如ImageNet），它的优势就会退化。为了使之实用化，我们提出了一种新的基于蒸馏的视觉变换器训练方法。与以前的工作不同，这里只提供了大量基于卷积的教师，我们引入了具有不同体系结构归纳偏差（例如卷积和对合）的轻量级教师来共同建议学生。关键在于，不同归纳偏差的教师在同一个数据集上接受不同的训练，却获得了不同的知识，这些不同的知识在提炼过程中复合并提高了学生的表现。配备了这种交叉感应偏压蒸馏方法，我们的视觉Transformer（称为CivT）优于ImageNet上所有相同结构的Transformer。摘要：Transformers recently are adapted from the community of natural language processing as a promising substitute of convolution-based neural networks for visual learning tasks. However, its supremacy degenerates given an insufficient amount of training data (e.g., ImageNet). To make it into practical utility, we propose a novel distillation-based method to train vision transformers. Unlike previous works, where merely heavy convolution-based teachers are provided, we introduce lightweight teachers with different architectural inductive biases (e.g., convolution and involution) to co-advise the student transformer. The key is that teachers with different inductive biases attain different knowledge despite that they are trained on the same dataset, and such different knowledge compounds and boosts the student's performance during distillation. Equipped with this cross inductive bias distillation method, our vision transformers (termed as CivT) outperform all previous transformers of the same architecture on ImageNet.

聚类(2篇)

【1】 Better Algorithms for Individually Fair k-Clustering标题：单个公平k-聚类的更好算法

作者：Deeparnab Chakrabarty,Maryam Negahbani 机构：Department of Computer Science, Dartmouth College, Hanover, NH 链接：https://arxiv.org/abs/2106.12150 摘要：在个体公平性的背景下，我们研究了以$\ellp$-范数为目标（例如，$k$-中值和$k$-均值）的数据聚类问题。数据集由$n$个点组成，我们希望找到$k$个中心，这样（a）目标最小化，而（b）尊重单个公平性约束，即每个点$v$在最大距离$r（v）$内有一个中心，其中$r（v）$是$v$到其$（n/k）$最近点的距离。Jung、Kannan和Lutz[forc2020]引入了这个概念，并设计了一个聚类算法，该算法具有可证明的（近似的）公平性和对$\ellinfty$或$k$-中心目标的客观保证。Mahabadi和Vakilian[ICML2020]重新讨论了这个问题，给出了所有$\ellp$-范数的局部搜索算法。从经验上看，他们的算法在成本方面比Jung等人的算法有很大的优势（中位数和平均值分别为k美元和k美元），但他们在公平性方面遭受了合理的损失。在本文中，我们的主要贡献是利用线性规划（LP）技术来获得更好的算法，无论是在理论上还是在实践中。我们证明了通过修改已知的LP舍入技术，可以得到比MV20更好的目标的最坏情况保证，并且在经验上，这个目标非常接近最优。此外，我们的理论公平性保证在理论上与MV20相当，并且在经验上，我们得到了明显更公平的解决方案。虽然精确地求解LP{\em}可能是禁止的，但是我们证明了在实践中，一个简单的稀疏化技术极大地提高了算法的运行时间。摘要：We study data clustering problems with $\ell_p$-norm objectives (e.g. $k$-Median and $k$-Means) in the context of individual fairness. The dataset consists of $n$ points, and we want to find $k$ centers such that (a) the objective is minimized, while (b) respecting the individual fairness constraint that every point $v$ has a center within a distance at most $r(v)$, where $r(v)$ is $v$'s distance to its $(n/k)$th nearest point. Jung, Kannan, and Lutz [FORC 2020] introduced this concept and designed a clustering algorithm with provable (approximate) fairness and objective guarantees for the $\ell_\infty$ or $k$-Center objective. Mahabadi and Vakilian [ICML 2020] revisited this problem to give a local-search algorithm for all $\ell_p$-norms. Empirically, their algorithms outperform Jung et. al.'s by a large margin in terms of cost (for $k$-Median and $k$-Means), but they incur a reasonable loss in fairness. In this paper, our main contribution is to use Linear Programming (LP) techniques to obtain better algorithms for this problem, both in theory and in practice. We prove that by modifying known LP rounding techniques, one gets a worst-case guarantee on the objective which is much better than in MV20, and empirically, this objective is extremely close to the optimal. Furthermore, our theoretical fairness guarantees are comparable with MV20 in theory, and empirically, we obtain noticeably fairer solutions. Although solving the LP {\em exactly} might be prohibitive, we demonstrate that in practice, a simple sparsification technique drastically improves the run-time of our algorithm.

【2】 Clustering of check-in sequences using the mixture Markov chain process 标题：基于混合马尔科夫链过程的签到序列聚类

作者：Elena Shmileva,Viktor Sarzhan 机构：National Research University Higher School of Economics, St.Petersburg campus, Math, department,A Kantemirovskaya Street, St Petersburg, Russian Federation 备注：16 pages, 4 figures 链接：https://arxiv.org/abs/2106.12039 摘要：这项工作是致力于聚类的签入序列从一个地理社会网络。我们使用混合马尔可夫链过程作为时间相关数据类型的数学模型。对于聚类，我们调整了期望最大化（EM）算法。结果，我们获得了现已不复存在的地理社会网络Weeplaces用户的高度详细的社区（集群）。摘要：This work is devoted to the clustering of check-in sequences from a geosocial network. We used the mixture Markov chain process as a mathematical model for time-dependent types of data. For clustering, we adjusted the Expectation-Maximization (EM) algorithm. As a result, we obtained highly detailed communities (clusters) of users of the now defunct geosocial network, Weeplaces.

自动驾驶|车辆|车道检测等(1篇)

【1】 Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers 标题：EURO-PVI：密集城市中心的行人车辆互动

作者：Apratim Bhattacharyya,Daniel Olmeda Reino,Mario Fritz,Bernt Schiele 备注：To appear at CVPR 2021 链接：https://arxiv.org/abs/2106.12442 摘要：行人和自行车路径的准确预测是在密集的城市环境中开发可靠的自动驾驶车辆必不可少的。车辆与行人或骑自行车者之间的相互作用对交通参与者的轨迹有重要影响，例如停车或转弯以避免碰撞。尽管最近的数据集和轨迹预测方法促进了自主车辆的发展，但建模的车辆-行人（骑自行车者）相互作用的数量很少。在这项工作中，我们提出欧洲PVI，行人和自行车的轨迹数据集。特别是，与现有数据集相比，我们的数据集在密集的城市场景中满足了更多样化和复杂的交互。为了解决在预测具有密集交互作用的未来轨迹时所面临的挑战，我们开发了一个联合推理模型，该模型可以学习城市场景中多个智能体之间的多模态共享潜在空间。这使得我们的联合-$\beta$-cVAE方法能够更好地模拟未来轨迹的分布。我们在nuScenes和Euro PVI数据集上获得了最新的结果，证明了捕捉自我车辆和行人（骑自行车者）之间的相互作用对于准确预测的重要性。摘要：Accurate prediction of pedestrian and bicyclist paths is integral to the development of reliable autonomous vehicles in dense urban environments. The interactions between vehicle and pedestrian or bicyclist have a significant impact on the trajectories of traffic participants e.g. stopping or turning to avoid collisions. Although recent datasets and trajectory prediction approaches have fostered the development of autonomous vehicles yet the amount of vehicle-pedestrian (bicyclist) interactions modeled are sparse. In this work, we propose Euro-PVI, a dataset of pedestrian and bicyclist trajectories. In particular, our dataset caters more diverse and complex interactions in dense urban scenarios compared to the existing datasets. To address the challenges in predicting future trajectories with dense interactions, we develop a joint inference model that learns an expressive multi-modal shared latent space across agents in the urban scene. This enables our Joint-$\beta$-cVAE approach to better model the distribution of future trajectories. We achieve state of the art results on the nuScenes and Euro-PVI datasets demonstrating the importance of capturing interactions between ego-vehicle and pedestrians (bicyclists) for accurate predictions.

联邦学习|隐私保护|加密(2篇)

【1】 Fine-Grained Data Selection for Improved Energy Efficiency of Federated Edge Learning 标题：提高联合边缘学习能量效率的细粒度数据选择

作者：Abdullatif Albaseer,Mohamed Abdallah,Ala Al-Fuqaha,Aiman Erbad 机构： Hamad Bin Khlifa University 链接：https://arxiv.org/abs/2106.12561 摘要：在联邦边缘学习（FEEL）中，网络边缘的能量受限设备在训练和上传本地机器学习模型时会消耗大量的能量，导致其寿命缩短。本文通过综合考虑本地训练数据、可用计算和通信资源以及FEEL轮的截止时间约束，提出了一种新的节能FEEL解决方案。该文考虑了一个系统模型，其中边缘服务器配备多个天线，采用波束形成技术通过正交信道与本地用户进行通信。具体地说，我们考虑了一个问题，该问题的目标是寻找最优的用户资源，包括相关训练样本、带宽、传输功率、波束形成权重和处理速度的细粒度选择，其目标是在给定通信轮的最后期限约束的情况下最小化总能量消耗。然后，我们首先提出了一种新的细粒度训练算法，该算法排除了不太相关的训练样本，并且只有效地选择提高模型性能的样本。在此基础上，我们提出了一种基于黄金分割的迭代算法，以寻找最优的计算和通信资源，使能源消耗最小化。使用MNIST和CIFAR-10数据集进行的实验表明，当MNIST和CIFAR-10数据集的能耗分别降低79%和73%时，我们提出的算法的性能大大优于最新的解决方案。摘要：In Federated edge learning (FEEL), energy-constrained devices at the network edge consume significant energy when training and uploading their local machine learning models, leading to a decrease in their lifetime. This work proposes novel solutions for energy-efficient FEEL by jointly considering local training data, available computation, and communications resources, and deadline constraints of FEEL rounds to reduce energy consumption. This paper considers a system model where the edge server is equipped with multiple antennas employing beamforming techniques to communicate with the local users through orthogonal channels. Specifically, we consider a problem that aims to find the optimal user's resources, including the fine-grained selection of relevant training samples, bandwidth, transmission power, beamforming weights, and processing speed with the goal of minimizing the total energy consumption given a deadline constraint on the communication rounds of FEEL. Then, we devise tractable solutions by first proposing a novel fine-grained training algorithm that excludes less relevant training samples and effectively chooses only the samples that improve the model's performance. After that, we derive closed-form solutions, followed by a Golden-Section-based iterative algorithm to find the optimal computation and communication resources that minimize energy consumption. Experiments using MNIST and CIFAR-10 datasets demonstrate that our proposed algorithms considerably outperform the state-of-the-art solutions as energy consumption decreases by 79% for MNIST and 73% for CIFAR-10 datasets.

【2】 Behavior Mimics Distribution: Combining Individual and Group Behaviors for Federated Learning 标题：行为模拟分布：将个体行为和群体行为结合起来进行联合学习

作者：Hua Huang,Fanhua Shang,Yuanyuan Liu,Hongying Liu 机构：Key Lab of Intelligent Perception and Image Understanding of Ministry of Education, School of Artificial Intelligence, Xidian University, China, Peng Cheng Lab, Shenzhen, China 备注：This paper has been accepted by International Joint Conference on Artificial Intelligence (IJCAI) 2021 链接：https://arxiv.org/abs/2106.12300 摘要：联邦学习（FL）已经成为一种活跃的、有前途的分布式机器学习模式。由于统计上的异质性，最近的研究清楚地表明，流行的FL方法（例如FedAvg）的性能由于本地更新引起的客户端漂移而急剧恶化。本文提出了一种新的联合学习算法（IGFL），它利用个体和群体的行为来模拟分布，从而提高了对异质性的处理能力。与现有的FL方法不同，我们的IGFL可以应用于客户机和服务器优化。作为一个副产品，我们提出了一种新的基于注意的联邦学习在服务器优化的IGFL。据我们所知，这是第一次将注意机制纳入联邦优化。我们进行了大量的实验，结果表明IGFL可以显著提高现有联邦学习方法的性能。特别是当个体间的数据分布不同时，IGFL可以将分类精度提高13%左右。摘要：Federated Learning (FL) has become an active and promising distributed machine learning paradigm. As a result of statistical heterogeneity, recent studies clearly show that the performance of popular FL methods (e.g., FedAvg) deteriorates dramatically due to the client drift caused by local updates. This paper proposes a novel Federated Learning algorithm (called IGFL), which leverages both Individual and Group behaviors to mimic distribution, thereby improving the ability to deal with heterogeneity. Unlike existing FL methods, our IGFL can be applied to both client and server optimization. As a by-product, we propose a new attention-based federated learning in the server optimization of IGFL. To the best of our knowledge, this is the first time to incorporate attention mechanisms into federated optimization. We conduct extensive experiments and show that IGFL can significantly improve the performance of existing federated learning methods. Especially when the distributions of data among individuals are diverse, IGFL can improve the classification accuracy by about 13% compared with prior baselines.

推理|分析|理解|解释(11篇)

【1】 Feature Attributions and Counterfactual Explanations Can Be Manipulated 标题：可以操纵特征属性和反事实解释

作者：Dylan Slack,Sophie Hilgard,Sameer Singh,Hima Lakkaraju 机构：† 备注：arXiv admin note: text overlap with arXiv:2106.02666 链接：https://arxiv.org/abs/2106.12563 摘要：随着机器学习模型越来越多地用于关键决策环境（如医疗保健、金融），人们越来越重视开发解释模型预测的方法。这种解释用于理解和建立模型中的信任，是机器学习管道中的重要组成部分。尽管解释是这些系统中的一个关键部分，但人们对它们如何容易被对手操纵却知之甚少。在本文中，我们讨论了两大类解释如何容易被操纵。我们演示了对手如何设计有偏见的模型，操纵模型不可知的特征归因方法（例如，LIME\&SHAP）和反事实解释（例如，Wachter的算法\&DiCE）在反事实搜索过程中爬山到模型的偏见。这些漏洞允许对手部署有偏见的模型，但解释不会揭示这种偏见，从而欺骗利益相关者信任模型。我们评估了对真实世界数据集的操作，包括COMPAS和Communities\&Crime，并发现解释在实践中可以被操纵。摘要：As machine learning models are increasingly used in critical decision-making settings (e.g., healthcare, finance), there has been a growing emphasis on developing methods to explain model predictions. Such \textit{explanations} are used to understand and establish trust in models and are vital components in machine learning pipelines. Though explanations are a critical piece in these systems, there is little understanding about how they are vulnerable to manipulation by adversaries. In this paper, we discuss how two broad classes of explanations are vulnerable to manipulation. We demonstrate how adversaries can design biased models that manipulate model agnostic feature attribution methods (e.g., LIME \& SHAP) and counterfactual explanations that hill-climb during the counterfactual search (e.g., Wachter's Algorithm \& DiCE) into \textit{concealing} the model's biases. These vulnerabilities allow an adversary to deploy a biased model, yet explanations will not reveal this bias, thereby deceiving stakeholders into trusting the model. We evaluate the manipulations on real world data sets, including COMPAS and Communities \& Crime, and find explanations can be manipulated in practice.

【2】 Synthetic Benchmarks for Scientific Research in Explainable Machine Learning 标题：可解释机器学习中的科学研究综合基准

作者：Yang Liu,Sujay Khandagale,Colin White,Willie Neiswanger 机构：AI 2Stanford University 链接：https://arxiv.org/abs/2106.12543 摘要：随着机器学习模型变得越来越复杂，它们的应用变得越来越重要，解释模型预测的工具变得越来越重要。尽管可解释性技术被广泛使用，评估和比较不同的特征属性方法仍然具有挑战性：评估理想情况下需要人类研究，而经验评估指标在实际数据集的计算上往往是禁止的。在这项工作中，我们通过发布XAI Bench来解决这个问题：一套合成数据集和一个用于基准特性属性算法的库。与真实世界的数据集不同，合成数据集允许有效地计算条件期望值，这些值是评估基本真值Shapley值和其他度量所需的。我们发布的合成数据集提供了各种各样的参数，可以配置这些参数来模拟真实世界的数据。我们通过对流行的解释性技术进行多个评估指标的基准测试，并识别流行解释者的失败模式，来展示我们的库的强大功能。我们图书馆的效率将有助于从开发到部署带来新的解释方法。摘要：As machine learning models grow more complex and their applications become more high-stakes, tools for explaining model predictions have become increasingly important. Despite the widespread use of explainability techniques, evaluating and comparing different feature attribution methods remains challenging: evaluations ideally require human studies, and empirical evaluation metrics are often computationally prohibitive on real-world datasets. In this work, we address this issue by releasing XAI-Bench: a suite of synthetic datasets along with a library for benchmarking feature attribution algorithms. Unlike real-world datasets, synthetic datasets allow the efficient computation of conditional expected values that are needed to evaluate ground-truth Shapley values and other metrics. The synthetic datasets we release offer a wide variety of parameters that can be configured to simulate real-world data. We demonstrate the power of our library by benchmarking popular explainability techniques across several evaluation metrics and identifying failure modes for popular explainers. The efficiency of our library will help bring new explainability methods from development to deployment.

【3】 How Well do Feature Visualizations Support Causal Understanding of CNN Activations? 标题：功能可视化在多大程度上支持CNN激活的因果理解？

作者：Roland S. Zimmermann,Judy Borowski,Robert Geirhos,Matthias Bethge,Thomas S. A. Wallis,Wieland Brendel 机构： 1University of Tübingen, Germany 2TechnischeUniversitätDarmstadt 备注：ICML 2021 XAI workshop version. Joint first and last authors. Project website at this https URL 链接：https://arxiv.org/abs/2106.12447 摘要：理解深度卷积神经网络内部工作的一种广泛使用的方法是通过激活最大化来可视化单元响应。通过激活最大化的特征可视化被认为为人类提供了关于导致一个单位被激活的图像特征的精确信息。如果这是真的，这些合成图像应该能让人类预测干预的效果，比如遮挡图像的某个区域（比如，狗的头部）是否会改变一个单位的激活。在这里，我们通过让人类预测两个方形闭塞中的哪一个导致一个单位的激活发生更大的变化来检验这个假设。大规模众包实验和专家测量均表明，平均而言，Olah等人（2017年）的极度活跃的特征可视化确实有助于人类完成这项任务（准确率为67美元/pm 4\%$；没有任何可视化的基线性能是$60\pm3\%$）。但是，与其他可视化（例如数据集样本）相比，它们没有提供任何显著的优势，后者产生类似的性能（$66\pm 3\%$到$67\pm 3\%$精度）。综上所述，我们提出了一个客观的心理物理学任务来量化单位水平的可解释性方法对人类的益处，并且没有发现任何证据表明特征可视化比简单的替代可视化能为人类提供更好的“因果理解”。摘要：One widely used approach towards understanding the inner workings of deep convolutional neural networks is to visualize unit responses via activation maximization. Feature visualizations via activation maximization are thought to provide humans with precise information about the image features that cause a unit to be activated. If this is indeed true, these synthetic images should enable humans to predict the effect of an intervention, such as whether occluding a certain patch of the image (say, a dog's head) changes a unit's activation. Here, we test this hypothesis by asking humans to predict which of two square occlusions causes a larger change to a unit's activation. Both a large-scale crowdsourced experiment and measurements with experts show that on average, the extremely activating feature visualizations by Olah et al. (2017) indeed help humans on this task ($67 \pm 4\%$ accuracy; baseline performance without any visualizations is $60 \pm 3\%$). However, they do not provide any significant advantage over other visualizations (such as e.g. dataset samples), which yield similar performance ($66 \pm 3\%$ to $67 \pm 3\%$ accuracy). Taken together, we propose an objective psychophysical task to quantify the benefit of unit-level interpretability methods for humans, and find no evidence that feature visualizations provide humans with better "causal understanding" than simple alternative visualizations.

【4】 First Step Towards EXPLAINable DGA Multiclass Classification 标题：迈向可解释DGA多类分类的第一步

作者：Arthur Drichel,Nils Faerber,Ulrike Meyer 机构：RWTH Aachen University 备注：None 链接：https://arxiv.org/abs/2106.12336 摘要：许多恶意软件家族依靠域生成算法（DGA）建立与指挥控制（C2）服务器的连接。为了对抗DGA，已经提出了一些机器学习分类器，能够识别生成特定域名的DGA，从而触发有针对性的补救措施。然而，现有的分类器都是基于深度学习模型的。这些问题的黑匣子性质使得很难评估他们的推理。由此产生的信心不足使得利用这种模型是不切实际的。本文提出了一种基于特征的无上下文DGA多类分类器。在同一真实数据的统一环境下，我们比较了几种特征集和超参数的组合，并与几种最先进的分类器进行了比较。我们的分类器取得了有竞争力的结果，是实时的能力，其预测更容易追溯到特征比预测的DGA多类分类器提出的相关工作。摘要：Numerous malware families rely on domain generation algorithms (DGAs) to establish a connection to their command and control (C2) server. Counteracting DGAs, several machine learning classifiers have been proposed enabling the identification of the DGA that generated a specific domain name and thus triggering targeted remediation measures. However, the proposed state-of-the-art classifiers are based on deep learning models. The black box nature of these makes it difficult to evaluate their reasoning. The resulting lack of confidence makes the utilization of such models impracticable. In this paper, we propose EXPLAIN, a feature-based and contextless DGA multiclass classifier. We comparatively evaluate several combinations of feature sets and hyperparameters for our approach against several state-of-the-art classifiers in a unified setting on the same real-world data. Our classifier achieves competitive results, is real-time capable, and its predictions are easier to trace back to features than the predictions made by the DGA multiclass classifiers proposed in related work.

【5】 Learning Explainable Representations of Malware Behavior 标题：学习恶意软件行为的可解释表示

作者：Paul Prasse,Jan Brabec,Jan Kohout,Martin Kopp,Lukas Bajer,Tobias Scheffer 机构： University of Potsdam, Department of Computer Science, Germany, Cisco Systems, Cognitive Intelligence, Prague, Czech Republic 备注：This is a pre-print of an article to appear in Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2021 链接：https://arxiv.org/abs/2106.12328 摘要：我们解决在网络遥测日志中识别恶意软件的问题，并提供\emph{危害指标}——对识别威胁的行为模式的可理解解释。在我们的系统中，一组专门的检测器首先将网络流数据抽象为可理解的网络事件。我们开发了一个神经网络来处理这一系列事件，并识别特定的威胁、恶意软件家族和广泛的恶意软件类别。然后，我们使用\emph{integratedgradients}方法突出显示共同构成威胁特征行为模式的事件。我们比较了基于CNNs、LSTMs和transformers的网络结构，并通过实验探讨了无监督预训练对大规模遥测数据的有效性。我们演示了该系统如何根据行为模式检测njRAT和其他恶意软件。摘要：We address the problems of identifying malware in network telemetry logs and providing \emph{indicators of compromise} -- comprehensible explanations of behavioral patterns that identify the threat. In our system, an array of specialized detectors abstracts network-flow data into comprehensible \emph{network events} in a first step. We develop a neural network that processes this sequence of events and identifies specific threats, malware families and broad categories of malware. We then use the \emph{integrated-gradients} method to highlight events that jointly constitute the characteristic behavioral pattern of the threat. We compare network architectures based on CNNs, LSTMs, and transformers, and explore the efficacy of unsupervised pre-training experimentally on large-scale telemetry data. We demonstrate how this system detects njRAT and other malware based on behavioral patterns.

【6】 BiblioDAP: The 1st Workshop on Bibliographic Data Analysis and Processing 标题：BiblioDAP：首届书目数据分析与处理研讨会

作者：Zeyd Boukhers,Philipp Mayr,Silvio Peroni 机构：Institute for Web Science and, Technologies, University of Koblenz-Landau, Koblenz, Germany, GESIS – Leibniz-Institute for the, Social Sciences, Cologne, Germany, Research Centre for, Open Scholarly Metadata, Department of Classical Philology, and Italian Studies 备注：This workshop will be held in conjunction with KDD' 2021 链接：https://arxiv.org/abs/2106.12320 摘要：书目数据的自动处理在数字图书馆、数据科学和机器学习中变得非常重要，因为它的重要性与每年发表论文的显著增长同步，同时也面临着来自另一方面的固有挑战。这种处理有几个方面，包括但不限于I）从PDF文档中自动提取参考文献，II）构建准确的引文图，III）作者姓名消歧等。书目数据本质上是异构的，以结构化（如引文图）和非结构化（如出版物）两种格式出现。因此，需要对数据科学和机器学习技术进行处理和分析。这里我们介绍BiblioDAP'21：第一届书目数据分析与处理研讨会。摘要：Automatic processing of bibliographic data becomes very important in digital libraries, data science and machine learning due to its importance in keeping pace with the significant increase of published papers every year from one side and to the inherent challenges from the other side. This processing has several aspects including but not limited to I) Automatic extraction of references from PDF documents, II) Building an accurate citation graph, III) Author name disambiguation, etc. Bibliographic data is heterogeneous by nature and occurs in both structured (e.g. citation graph) and unstructured (e.g. publications) formats. Therefore, it requires data science and machine learning techniques to be processed and analysed. Here we introduce BiblioDAP'21: The 1st Workshop on Bibliographic Data Analysis and Processing.

【7】 Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis 标题：你应该再深入一点吗？基于感受场分析的无训练卷积神经网络结构优化

作者：Mats L. Richter,Julius Schöning,Ulf Krumnack 机构：Osnabrueck, Germany, Julius Sch¨oning, Osnabr¨uck University of Applied Sciences 备注：Preprint 链接：https://arxiv.org/abs/2106.12307 摘要：将人工神经网络（ANN）应用到特定任务中，研究人员、程序员和其他专家通常会在设计中过多地使用卷积层。这意味着，这些人工神经网络包含太多的参数，需要在不影响结果的情况下进行不必要的训练。卷积层所能处理的特征受到其感受野的严格限制。通过逐层分析感受野的扩展，我们可以可靠地预测在给定的神经网络结构中，对推理没有定性贡献的层序列。基于这些分析，我们提出了解决这些低效率的设计策略，优化了人工神经网络的可解释性和计算性能。由于这些策略和分析都不需要对实际模型进行训练，因此这些洞察使得人工神经网络体系结构的设计过程非常有效，将来可能会实现自动化。摘要：Applying artificial neural networks (ANN) to specific tasks, researchers, programmers, and other specialists usually overshot the number of convolutional layers in their designs. By implication, these ANNs hold too many parameters, which needed unnecessarily trained without impacting the result. The features, a convolutional layer can process, are strictly limited by its receptive field. By layer-wise analyzing the expansion of the receptive fields, we can reliably predict sequences of layers that will not contribute qualitatively to the inference in thegiven ANN architecture. Based on these analyses, we propose design strategies to resolve these inefficiencies, optimizing the explainability and the computational performance of ANNs. Since neither the strategies nor the analysis requires training of the actual model, these insights allow for a very efficient design process of ANNs architectures which might be automated in the future.

【8】 Improved Acyclicity Reasoning for Bayesian Network Structure Learning with Constraint Programming 标题：基于约束规划的贝叶斯网络结构学习改进的无圈推理

作者：Fulya Trösser,Simon de Givry,George Katsirelos 机构：Universit´e de Toulouse, INRAE, UR MIAT, F-, Castanet-Tolosan, France, UMR MIA-Paris, INRAE, AgroParisTech, Univ. Paris-Saclay, Paris, France 备注：None 链接：https://arxiv.org/abs/2106.12269 摘要：贝叶斯网络是一种概率图形模型，具有广泛的应用领域，包括基因调控网络推理、风险分析和图像处理。从离散数据中学习贝叶斯网络（BNSL）的结构是一个NP-hard任务，具有超指数的有向无环图搜索空间。在这项工作中，我们提出了一个新的多项式时间算法来发现所有可能的聚类割集的子集，一个贪心算法来近似求解得到的线性规划，以及一个广义弧一致性算法来解决无环约束。我们将它们嵌入到基于约束编程的分支定界求解器CPBayes中，并表明尽管它们是次优的，但它们的性能提高了几个数量级。所得到的解算器也与GOBNILP相比较，GOBNILP是解决BNSL问题的最先进的解算器，它解决了NP难问题，能够发现每个割点并精确地求解线性规划。摘要：Bayesian networks are probabilistic graphical models with a wide range of application areas including gene regulatory networks inference, risk analysis and image processing. Learning the structure of a Bayesian network (BNSL) from discrete data is known to be an NP-hard task with a superexponential search space of directed acyclic graphs. In this work, we propose a new polynomial time algorithm for discovering a subset of all possible cluster cuts, a greedy algorithm for approximately solving the resulting linear program, and a generalised arc consistency algorithm for the acyclicity constraint. We embed these in the constraint programmingbased branch-and-bound solver CPBayes and show that, despite being suboptimal, they improve performance by orders of magnitude. The resulting solver also compares favourably with GOBNILP, a state-of-the-art solver for the BNSL problem which solves an NP-hard problem to discover each cut and solves the linear program exactly.

【9】 ADAVI: Automatic Dual Amortized Variational Inference Applied To Pyramidal Bayesian Models 标题：ADAVI：应用于金字塔贝叶斯模型的自动对偶摊销变分推理

作者：Louis Rouillard,Demian Wassermann 机构：Université Paris-Saclay, Inria, CEA, Palaiseau, France 备注：None 链接：https://arxiv.org/abs/2106.12248 摘要：通常，人口研究的特点是金字塔组织的数据表示使用层次贝叶斯模型（HBM）丰富的板块。这些模型在神经成像（neuroimaging）等环境中可能会变得异常庞大，其中一个样本由一个功能性MRI信号组成，该信号在4个测量环节中，在6.4万个大脑位置进行测量，至少有数十名受试者。即使是在300个大脑位置的特定皮层区域上的一个简化例子，也会有大约100万个参数，这妨碍了基于模拟的推理（SBI）等现代密度估计技术的使用。为了在这类具有挑战性的问题中推断参数的后验分布，我们设计了一种新的方法来自动产生一个变分族对偶到目标HBM。这个变量族表示为一个神经网络，由一个基于注意的分层编码器组合而成，该编码器将摘要统计信息提供给一组规范化流。我们自动导出的神经网络利用了厚板的可交换性，并对其参数空间进行因子分解。由此产生的体系结构相对于典型的SBI表示减少了几个数量级的参数化，同时保持了表达能力。我们的方法在摊销设置中对指定的HBM进行推断：一旦训练，它可以很容易地应用于新的数据样本来计算参数的全后验概率。我们证明了我们的方法对模拟数据的能力，以及一个具有挑战性的高维大脑分割实验。我们还提出了SBI技术和结构化变分推理交叉的几个问题。摘要：Frequently, population studies feature pyramidally-organized data represented using Hierarchical Bayesian Models (HBM) enriched with plates. These models can become prohibitively large in settings such as neuroimaging, where a sample is composed of a functional MRI signal measured on 64 thousand brain locations, across 4 measurement sessions, and at least tens of subjects. Even a reduced example on a specific cortical region of 300 brain locations features around 1 million parameters, hampering the usage of modern density estimation techniques such as Simulation-Based Inference (SBI). To infer parameter posterior distributions in this challenging class of problems, we designed a novel methodology that automatically produces a variational family dual to a target HBM. This variatonal family, represented as a neural network, consists in the combination of an attention-based hierarchical encoder feeding summary statistics to a set of normalizing flows. Our automatically-derived neural network exploits exchangeability in the plate-enriched HBM and factorizes its parameter space. The resulting architecture reduces by orders of magnitude its parameterization with respect to that of a typical SBI representation, while maintaining expressivity. Our method performs inference on the specified HBM in an amortized setup: once trained, it can readily be applied to a new data sample to compute the parameters' full posterior. We demonstrate the capability of our method on simulated data, as well as a challenging high-dimensional brain parcellation experiment. We also open up several questions that lie at the intersection between SBI techniques and structured Variational Inference.

【10】 It's All in the Heads: Using Attention Heads as a Baseline for Cross-Lingual Transfer in Commonsense Reasoning 标题：一切都在中心语中：使用注意中心语作为常识推理中跨语言迁移的基线

作者：Alexey Tikhonov,Max Ryabinin 机构：Berlin, Germany, Yandex, HSE University, Moscow, Russia 备注：Accepted to Findings of ACL 2021. 13 pages, 4 figures. Code: this https URL 链接：https://arxiv.org/abs/2106.12066 摘要：常识推理是自然语言处理中的关键问题之一，但标记数据的相对匮乏阻碍了非英语语言的发展。预先训练的跨语言模型是强大的语言不可知表征的来源，但其固有的推理能力仍在积极研究中。在这项工作中，我们设计了一个简单的方法来进行常识推理，它训练一个以多头注意力权重为特征的线性分类器。为了评估这种方法，我们通过在一个标准化的管道中处理来自先前工作的多个数据集来创建一个多语言Winograd模式语料库，并从样本外性能的角度来衡量跨语言泛化能力。该方法与最近的有监督和无监督的常识推理方法相比具有竞争力，即使以零炮方式应用于其他语言。此外，我们还证明，对于所有研究的语言，大多数性能都是由相同的小部分注意头提供的，这为多语言编码器的通用推理能力提供了证据。摘要：Commonsense reasoning is one of the key problems in natural language processing, but the relative scarcity of labeled data holds back the progress for languages other than English. Pretrained cross-lingual models are a source of powerful language-agnostic representations, yet their inherent reasoning capabilities are still actively studied. In this work, we design a simple approach to commonsense reasoning which trains a linear classifier with weights of multi-head attention as features. To evaluate this approach, we create a multilingual Winograd Schema corpus by processing several datasets from prior work within a standardized pipeline and measure cross-lingual generalization ability in terms of out-of-sample performance. The method performs competitively with recent supervised and unsupervised approaches for commonsense reasoning, even when applied to other languages in a zero-shot manner. Also, we demonstrate that most of the performance is given by the same small subset of attention heads for all studied languages, which provides evidence of universal reasoning capabilities in multilingual encoders.

【11】 Analysis of the Evolution of Parametric Drivers of High-End Sea-Level Hazards 标题：高端海平面灾害的参数驱动因素演变分析

作者：Alana Hough,Tony E. Wong 机构：School of Mathematical Sciences, Rochester Institute of Technology, Rochester, NY , USA 链接：https://arxiv.org/abs/2106.12041 摘要：气候模型是制定战略管理海平面上升给沿海社区带来的风险的关键工具。虽然这些模型对于理解气候风险是必要的，但模型中的每个参数都有一定程度的不确定性。该模型的参数不确定性导致了未来气候风险的不确定性。因此，有必要了解这些参数的不确定性如何影响我们对未来气候风险的评估以及管理这些风险的战略的效力。在这里，我们使用随机森林来检验未来气候风险的参数驱动因素，以及这些驱动因素的相对重要性如何随时间变化。我们发现，在2020年至2150年间，平衡气候敏感性和衡量气溶胶对辐射强迫影响的因子始终是低辐射强迫情景和高辐射强迫情景下最重要的气候模式参数不确定性。高端海平面上升的短期危害主要由热膨胀驱动，而长期危害则与南极和格陵兰冰原的质量损失有关。我们的结果强调了在制定管理未来气候风险的策略时考虑时间演变的参数不确定性的实际重要性。摘要：Climate models are critical tools for developing strategies to manage the risks posed by sea-level rise to coastal communities. While these models are necessary for understanding climate risks, there is a level of uncertainty inherent in each parameter in the models. This model parametric uncertainty leads to uncertainty in future climate risks. Consequently, there is a need to understand how those parameter uncertainties impact our assessment of future climate risks and the efficacy of strategies to manage them. Here, we use random forests to examine the parametric drivers of future climate risk and how the relative importances of those drivers change over time. We find that the equilibrium climate sensitivity and a factor that scales the effect of aerosols on radiative forcing are consistently the most important climate model parametric uncertainties throughout the 2020 to 2150 interval for both low and high radiative forcing scenarios. The near-term hazards of high-end sea-level rise are driven primarily by thermal expansion, while the longer-term hazards are associated with mass loss from the Antarctic and Greenland ice sheets. Our results highlight the practical importance of considering time-evolving parametric uncertainties when developing strategies to manage future climate risks.

检测相关(3篇)

【1】 False perfection in machine prediction: Detecting and assessing circularity problems in machine learning 标题：机器预测中的错误完美：检测和评估机器学习中的循环性问题

作者：Michael Hagmann,Stefan Riezler 机构： Computational Linguistics, Heidelberg University, Heidelberg, Germany, Interdisciplinary Center for Scientific Computing (IWR), Heidelberg, �These authors contributed equally to this work. 链接：https://arxiv.org/abs/2106.12417 摘要：机器学习算法从输入数据和目标输出的模式中训练模型，目的是为看不见的测试输入预测正确的输出。在这里，我们展示了机器学习在医学信息学或专利法等重要应用领域中的一个问题，它包括在输入数据表示中确定定义目标输出的测量。这将导致基于已知目标定义的机器重构的完美但循环的预测，但在实际数据上失败，其中定义的测量可能不可用或仅不完全可用。对给定的数据集和黑盒机器学习模型进行了循环性检验，检验了目标函数定义是否可以重构并用于训练。我们认为，将研究成果转移到现实世界的应用需要通过将定义目标结果的度量与机器学习中的数据表示分离来避免循环性。摘要：Machine learning algorithms train models from patterns of input data and target outputs, with the goal of predicting correct outputs for unseen test inputs. Here we demonstrate a problem of machine learning in vital application areas such as medical informatics or patent law that consists of the inclusion of measurements on which target outputs are deterministically defined in the representations of input data. This leads to perfect, but circular predictions based on a machine reconstruction of the known target definition, but fails on real-world data where the defining measurements may not or only incompletely be available. We present a circularity test that shows, for given datasets and black-box machine learning models, whether the target functional definition can be reconstructed and has been used in training. We argue that a transfer of research results to real-world applications requires to avoid circularity by separating measurements that define target outcomes from data representations in machine learning.

【2】 Diabetic Retinopathy Detection using Ensemble Machine Learning 标题：基于集成机器学习的糖尿病视网膜病变检测

作者：Israa Odeh,Mouhammd Alkasassbeh,Mohammad Alauthman 机构：Department of Computer Science, PSUT, Amman, Jordan, Department of Information Security, University of Petra 链接：https://arxiv.org/abs/2106.12545 摘要：糖尿病视网膜病变（DR）是糖尿病患者视力下降的主要原因之一。DR是一种影响眼视网膜的微血管疾病，引起血管阻塞，从而切断视网膜组织的主要营养来源。当这种视觉障碍在早期被发现时，治疗是最有效的，因为严重的DR会导致不可逆转的失明。然而，医生的鉴定需要眼科医生的专业知识，这往往是昂贵和耗时的。因此，引入了自动检测系统，旨在促进识别过程，使其以时间和成本效益高的方式在全球范围内可用。然而，由于这一特殊眼病的可靠数据集和医疗记录有限，所获得的预测准确率对于眼科专家依赖它们作为诊断系统是相对不满意的。因此，我们探索了一种基于集成的学习策略，在一个复杂的诊断模型中融合了大量的知名分类算法。该框架在该领域所有其他常用分类算法中取得了最高的准确率。生成了4个子数据集，包含InfoGainEval选定的梅西多数据集的前5和前10个特征。在InfoGainEval上获得了70.7%和75.1%的准确率。前5名和原始数据集。结果表明，子数据集具有令人印象深刻的性能，这大大有助于减少复杂的分类过程摘要：Diabetic Retinopathy (DR) is among the worlds leading vision loss causes in diabetic patients. DR is a microvascular disease that affects the eye retina, which causes vessel blockage and therefore cuts the main source of nutrition for the retina tissues. Treatment for this visual disorder is most effective when it is detected in its earliest stages, as severe DR can result in irreversible blindness. Nonetheless, DR identification requires the expertise of Ophthalmologists which is often expensive and time-consuming. Therefore, automatic detection systems were introduced aiming to facilitate the identification process, making it available globally in a time and cost-efficient manner. However, due to the limited reliable datasets and medical records for this particular eye disease, the obtained predictions accuracies were relatively unsatisfying for eye specialists to rely on them as diagnostic systems. Thus, we explored an ensemble-based learning strategy, merging a substantial selection of well-known classification algorithms in one sophisticated diagnostic model. The proposed framework achieved the highest accuracy rates among all other common classification algorithms in the area. 4 subdatasets were generated to contain the top 5 and top 10 features of the Messidor dataset, selected by InfoGainEval. and WrapperSubsetEval., accuracies of 70.7% and 75.1% were achieved on the InfoGainEval. top 5 and original dataset respectively. The results imply the impressive performance of the subdataset, which significantly conduces to a less complex classification process

【3】 Innovations Autoencoder and its Application in Real-Time Anomaly Detection 标题：新息自动编码器及其在实时异常检测中的应用

作者：Xinyi Wang,Lang Tong 机构：School of Electrical and Computer Engineering, Cornell University, Ithaca, NY , USA 链接：https://arxiv.org/abs/2106.12382 摘要：时间序列的新息序列是一系列独立且同分布的随机变量序列，原始时间序列具有因果表示。一次的创新在统计上独立于时间序列的先前历史。因此，它代表的是目前所包含的新信息，而不是过去的信息。由于其简单的概率结构，新息序列是最有效的原始签名。与主成分分析（PCA/ICA）表示不同，新息序列不仅保留了原始时间序列的完整统计特性，而且保留了原始时间序列的时序。一个长期存在的开放性问题是寻找一种计算上易于处理的方法来提取非高斯过程的新息序列。提出了一种利用因果卷积神经网络提取新息序列的深度学习方法，称为新息自动编码器（IAE）。文中还介绍了IAE在具有未知异常和无异常模型的非参数异常检测中的应用。摘要：An innovations sequence of a time series is a sequence of independent and identically distributed random variables with which the original time series has a causal representation. The innovation at a time is statistically independent of the prior history of the time series. As such, it represents the new information contained at present but not in the past. Because of its simple probability structure, an innovations sequence is the most efficient signature of the original. Unlike the principle or independent analysis (PCA/ICA) representations, an innovations sequence preserves not only the complete statistical properties but also the temporal order of the original time series. An long-standing open problem is to find a computationally tractable way to extract an innovations sequence of non-Gaussian processes. This paper presents a deep learning approach, referred to as Innovations Autoencoder (IAE), that extracts innovations sequences using a causal convolutional neural network. An application of IAE to nonparametric anomaly detection with unknown anomaly and anomaly-free models is also presented.

分类|识别(4篇)

【1】 Who Leads and Who Follows in Strategic Classification? 标题：在战略分类中，谁是领导者，谁是跟随者？

作者：Tijana Zrnic,Eric Mazumdar,S. Shankar Sastry,Michael I. Jordan 机构：University of California, Berkeley 链接：https://arxiv.org/abs/2106.12529 摘要：随着预测模型被部署到现实世界中，它们必须越来越多地与战略行为抗衡。越来越多的战略分类研究将这一问题视为Stackelberg博弈：决策者通过部署一个模型来“领导”博弈，而战略代理则通过对部署的模型做出最佳反应来“跟随”。重要的是，在这个框架中，学习的负担完全放在决策者身上，而代理人的最佳反应被隐含地视为瞬时的。在这项工作中，我们认为，战略分类中的游戏顺序基本上是由决策者和代理适应彼此行为的相对频率决定的。特别是，通过将标准模型推广到允许两个参与者随着时间的推移进行学习，我们证明了比代理更新更快的决策者可以反转游戏顺序，这意味着代理主导，决策者跟随。我们观察到，在标准的学习环境中，这种角色转换对于决策者和战略代理人都是可取的。最后，我们证明了一个决策者可以自由选择其更新频率，可以诱导学习动力学收敛到Stackelberg均衡与任何顺序的发挥。摘要：As predictive models are deployed into the real world, they must increasingly contend with strategic behavior. A growing body of work on strategic classification treats this problem as a Stackelberg game: the decision-maker "leads" in the game by deploying a model, and the strategic agents "follow" by playing their best response to the deployed model. Importantly, in this framing, the burden of learning is placed solely on the decision-maker, while the agents' best responses are implicitly treated as instantaneous. In this work, we argue that the order of play in strategic classification is fundamentally determined by the relative frequencies at which the decision-maker and the agents adapt to each other's actions. In particular, by generalizing the standard model to allow both players to learn over time, we show that a decision-maker that makes updates faster than the agents can reverse the order of play, meaning that the agents lead and the decision-maker follows. We observe in standard learning settings that such a role reversal can be desirable for both the decision-maker and the strategic agents. Finally, we show that a decision-maker with the freedom to choose their update frequency can induce learning dynamics that converge to Stackelberg equilibria with either order of play.

【2】 Beyond Predictions in Neural ODEs: Identification and Interventions 标题：神经学颂歌中的超越预测：识别与干预

作者：Hananeh Aliee,Fabian J. Theis,Niki Kilbertus 机构：TUM, Helmholtz Center, Munich 链接：https://arxiv.org/abs/2106.12430 摘要：在模式匹配和预测任务取得巨大成功的推动下，研究人员越来越多地借助机器学习来帮助原始的科学发现。有了大量关于一个系统的观测数据，我们能揭示其演化的规律吗？解决这项任务有很大的希望，充分了解因果关系，并能够作出可靠的预测系统的行为下的干预。对于由常微分方程组（ODEs）生成的时间序列数据，我们朝着回答这个问题迈出了一步。虽然控制常微分方程可能无法单独从数据中识别，但我们表明，将简单的正则化方案与灵活的神经常微分方程相结合，可以从时间序列数据中稳健地恢复动力学和因果结构。我们的结果对各种（非线性）一阶和二阶系统以及实际数据验证了我们的方法。我们的结论是，在对变量或系统本身进行干预的情况下，我们也可以做出准确的预测。摘要：Spurred by tremendous success in pattern matching and prediction tasks, researchers increasingly resort to machine learning to aid original scientific discovery. Given large amounts of observational data about a system, can we uncover the rules that govern its evolution? Solving this task holds the great promise of fully understanding the causal interactions and being able to make reliable predictions about the system's behavior under interventions. We take a step towards answering this question for time-series data generated from systems of ordinary differential equations (ODEs). While the governing ODEs might not be identifiable from data alone, we show that combining simple regularization schemes with flexible neural ODEs can robustly recover the dynamics and causal structures from time-series data. Our results on a variety of (non)-linear first and second order systems as well as real data validate our method. We conclude by showing that we can also make accurate predictions under interventions on variables or the system itself.

【3】 Finding Phish in a Haystack: A Pipeline for Phishing Classification on Certificate Transparency Logs 标题：在草堆中查找网络钓鱼：一种基于证书透明日志的网络钓鱼分类管道

作者：Arthur Drichel,Vincent Drury,Justus von Brandt,Ulrike Meyer 机构：RWTH Aachen University 备注：None 链接：https://arxiv.org/abs/2106.12343 摘要：当前流行的网络钓鱼防范技术主要利用反应式阻止列表，这给攻击者留下了一个“机会之窗”，在此期间受害者没有受到保护。缩短此窗口的一种可能方法旨在通过监视证书透明性（CT）日志，在网站准备过程中更早地检测钓鱼攻击。以前曾尝试使用CT日志数据进行网络钓鱼分类，但缺乏对实际CT日志数据的评估。在本文中，我们提出了一个管道，通过解决处理CT测井数据时的一些问题来促进此类评估。管道包括数据集创建、训练以及CT日志的过去或当前分类。它的模块化结构使得可以很容易地交换分类器或验证源，以支持地面真值标注工作和分类器比较。我们在一些新的和现有的分类器上测试了这个管道，并发现了在将来改进分类器的一般潜力。我们发布了管道的源代码和使用的数据集(https://gitlab.com/rwth-itsec/ctl-pipeline)从而使这一方向的未来研究更容易进行。摘要：Current popular phishing prevention techniques mainly utilize reactive blocklists, which leave a ``window of opportunity'' for attackers during which victims are unprotected. One possible approach to shorten this window aims to detect phishing attacks earlier, during website preparation, by monitoring Certificate Transparency (CT) logs. Previous attempts to work with CT log data for phishing classification exist, however they lack evaluations on actual CT log data. In this paper, we present a pipeline that facilitates such evaluations by addressing a number of problems when working with CT log data. The pipeline includes dataset creation, training, and past or live classification of CT logs. Its modular structure makes it possible to easily exchange classifiers or verification sources to support ground truth labeling efforts and classifier comparisons. We test the pipeline on a number of new and existing classifiers, and find a general potential to improve classifiers for this scenario in the future. We publish the source code of the pipeline and the used datasets along with this paper (https://gitlab.com/rwth-itsec/ctl-pipeline), thus making future research in this direction more accessible.

【4】 Deep Neural Network Based Respiratory Pathology Classification Using Cough Sounds 标题：基于深度神经网络的咳嗽音呼吸系统病理分类

作者：Balamurali B T,Hwan Ing Hee,Saumitra Kapoor,Oon Hoe Teoh,Sung Shin Teng,Khai Pin Lee,Dorien Herremans,Jer Ming Chen 机构：��, Citation: Lastname, F.; Lastname, F.;, Lastname, F. Title. Preprints ,. 链接：https://arxiv.org/abs/2106.12174 摘要：智能系统正在改变世界，也在改变我们的医疗体系。我们提出了一个基于深度学习的咳嗽音分类模型，可以区分哮喘、上呼吸道感染（URTI）和下呼吸道感染（LRTI）等健康咳嗽和病理性咳嗽的儿童。为了训练一个深层神经网络模型，我们收集了一个新的咳嗽声数据集，标记了临床医生的诊断。所选择的模型是基于Mel倒谱系数（mfcc）特征的双向长短时记忆网络（BiLSTM）。当对健康或病理（一般或属于特定的呼吸病理）两类咳嗽进行分类时，得到的训练模型在根据医生诊断提供的标签对咳嗽进行分类时达到了84%以上的准确率。为了对受试者的呼吸病理状况进行分类，将每个受试者的多个咳嗽时期的结果结合起来。三种呼吸疾病的预测准确率均超过91%。然而，当模型被训练来分类和区分四类咳嗽时，总体准确率下降：一类病理性咳嗽常常被误分类为另一类。然而，如果将健康咳嗽分为健康咳嗽和病理咳嗽分为某些病理类型，则四类模型的总体准确率在84%以上。对MFCC特征空间的纵向研究表明，病理性咳嗽与恢复性咳嗽在同一个受试者身上所占的特征空间是相同的，因此仅用MFCC特征很难区分。摘要：Intelligent systems are transforming the world, as well as our healthcare system. We propose a deep learning-based cough sound classification model that can distinguish between children with healthy versus pathological coughs such as asthma, upper respiratory tract infection (URTI), and lower respiratory tract infection (LRTI). In order to train a deep neural network model, we collected a new dataset of cough sounds, labelled with clinician's diagnosis. The chosen model is a bidirectional long-short term memory network (BiLSTM) based on Mel Frequency Cepstral Coefficients (MFCCs) features. The resulting trained model when trained for classifying two classes of coughs -- healthy or pathology (in general or belonging to a specific respiratory pathology), reaches accuracy exceeding 84\% when classifying cough to the label provided by the physicians' diagnosis. In order to classify subject's respiratory pathology condition, results of multiple cough epochs per subject were combined. The resulting prediction accuracy exceeds 91\% for all three respiratory pathologies. However, when the model is trained to classify and discriminate among the four classes of coughs, overall accuracy dropped: one class of pathological coughs are often misclassified as other. However, if one consider the healthy cough classified as healthy and pathological cough classified to have some kind of pathologies, then the overall accuracy of four class model is above 84\%. A longitudinal study of MFCC feature space when comparing pathological and recovered coughs collected from the same subjects revealed the fact that pathological cough irrespective of the underlying conditions occupy the same feature space making it harder to differentiate only using MFCC features.

3D|3D重建等相关(1篇)

【1】 The Neurally-Guided Shape Parser: A Monte Carlo Method for Hierarchical Labeling of Over-segmented 3D Shapes 标题：神经引导的形状解析器：超分割三维形状分层标注的蒙特卡罗方法

作者：R. Kenny Jones,Rana Hanocka,Daniel Ritchie 机构：Brown University, University of Chicago 链接：https://arxiv.org/abs/2106.12026 摘要：许多基于学习的三维形状语义分割方法通过端到端方式训练的单通道方法将标签分配给形状原子（例如点云中的点或网格中的面）。这种方法取得了令人印象深刻的性能，但需要大量的标记训练数据。这种范式纠缠着两个可分离的子问题：（1）将形状分解为区域；（2）为这些区域分配语义标签。我们声称，解开这些子问题可以减少标记数据的负担：（1）区域分解不需要语义标记，可以在无监督的方式下执行；（2）标记形状区域而不是原子会导致较小的搜索空间，应该可以用较少的标记训练数据进行学习。在本文中，我们通过介绍神经引导形状分析器（NGSP）来研究第二种说法，NGSP是一种学习如何为过度分割的3D形状区域分配语义标签的方法。我们通过映射推理来解决这个问题，在输入形状的条件下建立标签分配的后验概率模型。我们采用了一种由神经网络引导的蒙特卡罗重要性抽样方法，这种基于搜索的方法通过假设输入形状被分解为离散区域而变得可行。我们评估了NGSP的任务层次语义分割制造三维形状从零件网。我们发现NGSP比基线有显著的性能改进，基线学习标记形状原子，然后为每个形状区域聚合预测，特别是在低数据区域。最后，我们证明了NGSP对区域粒度的鲁棒性，因为它在区域发生严重损坏的情况下仍然保持了很强的分割性能。摘要：Many learning-based 3D shape semantic segmentation methods assign labels to shape atoms (e.g. points in a point cloud or faces in a mesh) with a single-pass approach trained in an end-to-end fashion. Such methods achieve impressive performance but require large amounts of labeled training data. This paradigm entangles two separable subproblems: (1) decomposing a shape into regions and (2) assigning semantic labels to these regions. We claim that disentangling these subproblems reduces the labeled data burden: (1) region decomposition requires no semantic labels and could be performed in an unsupervised fashion, and (2) labeling shape regions instead of atoms results in a smaller search space and should be learnable with less labeled training data. In this paper, we investigate this second claim by presenting the Neurally-Guided Shape Parser (NGSP), a method that learns how to assign semantic labels to regions of an over-segmented 3D shape. We solve this problem via MAP inference, modeling the posterior probability of a labeling assignment conditioned on an input shape. We employ a Monte Carlo importance sampling approach guided by a neural proposal network, a search-based approach made feasible by assuming the input shape is decomposed into discrete regions. We evaluate NGSP on the task of hierarchical semantic segmentation on manufactured 3D shapes from PartNet. We find that NGSP delivers significant performance improvements over baselines that learn to label shape atoms and then aggregate predictions for each shape region, especially in low-data regimes. Finally, we demonstrate that NGSP is robust to region granularity, as it maintains strong segmentation performance even as the regions undergo significant corruption.

编码器(1篇)

【1】 Stable, Fast and Accurate: Kernelized Attention with Relative Positional Encoding 标题：稳定、快速、准确：相对位置编码的核心注意力

作者：Shengjie Luo,Shanda Li,Tianle Cai,Di He,Dinglan Peng,Shuxin Zheng,Guolin Ke,Liwei Wang,Tie-Yan Liu 机构：Peking University, Princeton University, University of Science and Technology of China, Microsoft Research 备注：Preprint. Work in Progress 链接：https://arxiv.org/abs/2106.12566 摘要：注意模块是Transformer的重要组成部分，由于其二次复杂性，不能有效地扩展到长序列。许多工作集中在对点的逼近，然后对softmax函数进行指数化处理，导致了次二次甚至线性复杂的变换器结构。然而，我们发现，这些方法不能应用于更强大的注意模块，超越点然后指数风格，如Transformer与相对位置编码（RPE）。由于在许多最先进的模型中，相对位置编码被用作默认值，因此设计能够结合RPE的高效转换器是很有吸引力的。在本文中，我们提出了一种新的方法来加速Transformer的注意力计算的RPE上的核注意。基于相对位置编码形成Toeplitz矩阵的观察，我们从数学上证明了快速傅立叶变换（FFT）可以有效地计算RPE核化注意。通过FFT，我们的方法达到了$\mathcal{O}（n\logn）$的时间复杂度。有趣的是，我们进一步证明了适当使用相对位置编码可以缓解香草核化注意的训练不稳定性问题。在广泛的任务，我们的经验表明，我们的模型可以从零开始训练，没有任何优化问题。所学习的模型比许多有效的Transformer变型具有更好的性能，并且在长序列情况下比标准Transformer更快。摘要：The attention module, which is a crucial component in Transformer, cannot scale efficiently to long sequences due to its quadratic complexity. Many works focus on approximating the dot-then-exponentiate softmax function in the original attention, leading to sub-quadratic or even linear-complexity Transformer architectures. However, we show that these methods cannot be applied to more powerful attention modules that go beyond the dot-then-exponentiate style, e.g., Transformers with relative positional encoding (RPE). Since in many state-of-the-art models, relative positional encoding is used as default, designing efficient Transformers that can incorporate RPE is appealing. In this paper, we propose a novel way to accelerate attention calculation for Transformers with RPE on top of the kernelized attention. Based upon the observation that relative positional encoding forms a Toeplitz matrix, we mathematically show that kernelized attention with RPE can be calculated efficiently using Fast Fourier Transform (FFT). With FFT, our method achieves $\mathcal{O}(n\log n)$ time complexity. Interestingly, we further demonstrate that properly using relative positional encoding can mitigate the training instability problem of vanilla kernelized attention. On a wide range of tasks, we empirically show that our models can be trained from scratch without any optimization issues. The learned model performs better than many efficient Transformer variants and is faster than standard Transformer in the long-sequence regime.

优化|敛散性(4篇)

【1】 Bregman Gradient Policy Optimization 标题：Bregman梯度策略优化

作者：Feihu Huang,Shangqian Gao,Heng Huang 机构：Department of Electrical and Computer Engineering, University of Pittsburgh, Pittsburgh, USA, Editor: 备注：18 pages, 3 pages 链接：https://arxiv.org/abs/2106.12112 摘要：在本文中，我们设计了一个新的基于Bregman发散和动量技术的Bregman梯度强化学习策略优化框架。提出了一种基于基本动量技术和镜像下降迭代的Bregman梯度策略优化算法。同时，提出了一种基于动量方差缩减技术的加速布雷格曼梯度策略优化算法（VR-BGPO）。此外，我们还提出了一个非凸条件下Bregman梯度策略优化的收敛性分析框架。具体地说，我们证明了BGPO的样本复杂度为$\tilde{O}（\epsilon^{-4}）$，对于每次迭代只需要一条轨迹的$\epsilon$平稳点，VR-BGPO的样本复杂度为$\tilde{O}（\epsilon^{-3}）$，对于寻找$\epsilon$平稳点，VR-BGPO的样本复杂度为$\tilde{O}（\epsilon^{-3}）$，每次迭代只需要一条轨迹。特别地，通过使用不同的Bregman分歧，我们的方法统一了许多现有的策略优化算法及其新的变体，如现有的（方差减少的）策略梯度算法和（方差减少的）自然策略梯度算法。在多个强化学习任务上的大量实验结果证明了新算法的有效性。摘要：In this paper, we design a novel Bregman gradient policy optimization framework for reinforcement learning based on Bregman divergences and momentum techniques. Specifically, we propose a Bregman gradient policy optimization (BGPO) algorithm based on the basic momentum technique and mirror descent iteration. At the same time, we present an accelerated Bregman gradient policy optimization (VR-BGPO) algorithm based on a momentum variance-reduced technique. Moreover, we introduce a convergence analysis framework for our Bregman gradient policy optimization under the nonconvex setting. Specifically, we prove that BGPO achieves the sample complexity of $\tilde{O}(\epsilon^{-4})$ for finding $\epsilon$-stationary point only requiring one trajectory at each iteration, and VR-BGPO reaches the best known sample complexity of $\tilde{O}(\epsilon^{-3})$ for finding an $\epsilon$-stationary point, which also only requires one trajectory at each iteration. In particular, by using different Bregman divergences, our methods unify many existing policy optimization algorithms and their new variants such as the existing (variance-reduced) policy gradient algorithms and (variance-reduced) natural policy gradient algorithms. Extensive experimental results on multiple reinforcement learning tasks demonstrate the efficiency of our new algorithms.

【2】 Near-Optimal Linear Regression under Distribution Shift 标题：分布漂移下的次优线性回归

作者：Qi Lei,Wei Hu,Jason D. Lee 机构：‡ 备注：ICML 2021 链接：https://arxiv.org/abs/2106.12108 摘要：当源域有足够的数据，而目标域的标记数据很少时，迁移学习是必不可少的。我们发展估计，实现最小最大线性风险的线性回归问题的分布转移。我们的算法涵盖了不同的迁移学习设置，包括协变量移位和模型移位。我们还考虑了从线性或一般非线性模型生成数据的时间。我们证明了线性minimax估计在minimax风险的绝对常数范围内，即使在各种源/目标分布的非线性估计中也是如此。摘要：Transfer learning is essential when sufficient data comes from the source domain, with scarce labeled data from the target domain. We develop estimators that achieve minimax linear risk for linear regression problems under distribution shift. Our algorithms cover different transfer learning settings including covariate shift and model shift. We also consider when data are generated from either linear or general nonlinear models. We show that linear minimax estimators are within an absolute constant of the minimax risk even among nonlinear estimators for various source/target distributions.

【3】 Regret-optimal Estimation and Control 标题：后悔--最优估计与控制

作者：Gautam Goel,Babak Hassibi 机构：Gautam Goel is with the Department of Computing and Mathematical Sciences at Caltech (e-mail, †Babak Hassibi is with the Department of Electrical Engineering at Caltech (e-mail 链接：https://arxiv.org/abs/2106.12097 摘要：我们从后悔最小化的角度考虑线性时变动力系统的估计和控制。与以往在这方面的工作不同的是，我们着重于设计因果估计和控制器的问题，这些估计和控制器与透视非因果策略相竞争，而不是事后从某个固定的参数类中选择最佳策略。我们证明了利用鲁棒控制中的算子理论技术可以导出状态空间形式的后悔最优估计器和后悔最优控制器，并且根据扰动能量给出了算法产生的后悔的紧的、依赖于数据的界。我们的结果可以被看作是将传统的鲁棒估计和控制的重点放在最小化最坏情况成本上，扩展到最小化最坏情况后悔。针对非线性动力学系统，我们提出了模型预测控制（MPC）和扩展kalman滤波器（EKF）的后悔优化模拟算法，并给出了数值实验，实验结果表明，我们的后悔优化算法能够显著优于标准的估计和控制方法。摘要：We consider estimation and control in linear time-varying dynamical systems from the perspective of regret minimization. Unlike most prior work in this area, we focus on the problem of designing causal estimators and controllers which compete against a clairvoyant noncausal policy, instead of the best policy selected in hindsight from some fixed parametric class. We show that the regret-optimal estimator and regret-optimal controller can be derived in state-space form using operator-theoretic techniques from robust control and present tight,data-dependent bounds on the regret incurred by our algorithms in terms of the energy of the disturbances. Our results can be viewed as extending traditional robust estimation and control, which focuses on minimizing worst-case cost, to minimizing worst-case regret. We propose regret-optimal analogs of Model-Predictive Control (MPC) and the Extended KalmanFilter (EKF) for systems with nonlinear dynamics and present numerical experiments which show that our regret-optimal algorithms can significantly outperform standard approaches to estimation and control.

【4】 The Rate of Convergence of Variation-Constrained Deep Neural Networks 标题：变差约束深度神经网络的收敛速度

作者：Gen Li,Yuantao Gu,Jie Ding 机构： a series of variation-basedGen Li and Yuantao Gu are with the Department of Electronic Engineering, Tsinghua University, Ding is with the School of Statistics, University of Minnesota Twin Cities 链接：https://arxiv.org/abs/2106.12068 摘要：多层前馈网络被用来逼近各种非线性函数。一个重要而基本的问题是通过网络模型的统计风险或对未来数据的预期预测误差来理解其可学习性。据我们所知，现有工作所显示的神经网络的收敛速度在$n$样本量下最多为$n^{-1/4}$量级。本文证明了一类具有任意宽度的变分约束神经网络，对于任意小的正常数$\delta$，可以获得接近参数速率$n^{-1/2+\delta}$。它相当于均方误差下的$n^{-1+2\delta}$。数值实验也观察到了这个速率。结果表明，逼近光滑函数所需的神经函数空间可能不像人们通常所认为的那样大。我们的结果还揭示了当神经元数目和学习参数以$n$或超过$n$快速增长时，深层神经网络不容易出现过度拟合的现象。我们还讨论了其他网络参数的收敛速度，包括输入维数、网络层和系数范数。摘要：Multi-layer feedforward networks have been used to approximate a wide range of nonlinear functions. An important and fundamental problem is to understand the learnability of a network model through its statistical risk, or the expected prediction error on future data. To the best of our knowledge, the rate of convergence of neural networks shown by existing works is bounded by at most the order of $n^{-1/4}$ for a sample size of $n$. In this paper, we show that a class of variation-constrained neural networks, with arbitrary width, can achieve near-parametric rate $n^{-1/2+\delta}$ for an arbitrarily small positive constant $\delta$. It is equivalent to $n^{-1 +2\delta}$ under the mean squared error. This rate is also observed by numerical experiments. The result indicates that the neural function space needed for approximating smooth functions may not be as large as what is often perceived. Our result also provides insight to the phenomena that deep neural networks do not easily suffer from overfitting when the number of neurons and learning parameters rapidly grow with $n$ or even surpass $n$. We also discuss the rate of convergence regarding other network parameters, including the input dimension, network layer, and coefficient norm.

预测|估计(5篇)

【1】 Forecasting Health and Wellbeing for Shift Workers Using Job-role Based Deep Neural Network 标题：基于工作角色的深度神经网络在轮班工人健康福利预测中的应用

作者：Han Yu,Asami Itoh,Ryota Sakamoto,Motomu Shimaoka,Akane Sano 机构：Sano, Rice University, Houston TX , USA, Mie University, Mie ,-, Japan 备注：In: Wireless Mobile Communication and Healthcare. MobiHealth 2020 链接：https://arxiv.org/abs/2106.12081 摘要：轮班工人是我们社会的重要贡献者，他们面临着健康和福利不佳的高风险。为了帮助他们解决问题，我们收集并分析了轮班护士和医生的生理和行为可穿戴传感器数据，以及他们的行为问卷数据和他们自我报告的日常健康和幸福标签，包括警觉性、幸福感、能量、健康和压力。我们发现护士和医生的反应既有相同之处，也有不同之处。根据护士和医生自我报告的健康和幸福感标签的差异以及标签之间的相关性，我们提出了基于工作角色的多任务多标签深度学习模型，我们同时为护士和医生模拟生理和行为数据，预测参与者第二天的多维自我报告健康和幸福状况。在二元/三级分类和回归预测任务的评估中，我们的模型表现出明显优于基线模型和以前的最新模型。我们还发现，与心率、睡眠和轮班有关的特征有助于轮班工人的健康和幸福。摘要：Shift workers who are essential contributors to our society, face high risks of poor health and wellbeing. To help with their problems, we collected and analyzed physiological and behavioral wearable sensor data from shift working nurses and doctors, as well as their behavioral questionnaire data and their self-reported daily health and wellbeing labels, including alertness, happiness, energy, health, and stress. We found the similarities and differences between the responses of nurses and doctors. According to the differences in self-reported health and wellbeing labels between nurses and doctors, and the correlations among their labels, we proposed a job-role based multitask and multilabel deep learning model, where we modeled physiological and behavioral data for nurses and doctors simultaneously to predict participants' next day's multidimensional self-reported health and wellbeing status. Our model showed significantly better performances than baseline models and previous state-of-the-art models in the evaluations of binary/3-class classification and regression prediction tasks. We also found features related to heart rate, sleep, and work shift contributed to shift workers' health and wellbeing.

【2】 Towards Consistent Predictive Confidence through Fitted Ensembles 标题：通过拟合的系综走向一致的预测置信度

作者：Navid Kardan,Ankit Sharma,Kenneth O. Stanley 机构：Department of Computer Science, University of Central Florida, Orlando, USA, -,-,- 备注：IJCNN 2021 链接：https://arxiv.org/abs/2106.12070 摘要：深度神经网络是机器学习应用中最近取得的许多成功的背后原因。然而，这些模型在遇到分布外（OOD）的例子或做出错误的预测时会产生过度自信的决策。这种不一致的预测置信度限制了将独立训练的学习模型集成到一个更大的系统中。本文引入可分离概念学习框架，在面向对象的实例中真实地度量分类器的性能。在此设置中，分类器的多个实例在类集合的分区的不同部分上进行训练。随后，在单独的测试集上评估这些模型组合的性能。与当前的OOD检测技术不同，该框架不需要辅助OOD数据集，也不需要将分类与检测性能分开。此外，我们提出了一个新的强基线，用于在深度模型中更一致的预测置信度，称为拟合集合，其中过度自信的预测被原始分类任务的转换版本纠正。通过观察组件间相互矛盾的预测，拟合的集合可以自然地检测出OOD示例，而不需要辅助数据。在MNIST、SVHN、CIFAR-10/100和ImageNet上的实验表明，在OOD示例上，拟合的集成显著优于传统的集成，并且可以扩展。摘要：Deep neural networks are behind many of the recent successes in machine learning applications. However, these models can produce overconfident decisions while encountering out-of-distribution (OOD) examples or making a wrong prediction. This inconsistent predictive confidence limits the integration of independently-trained learning models into a larger system. This paper introduces separable concept learning framework to realistically measure the performance of classifiers in presence of OOD examples. In this setup, several instances of a classifier are trained on different parts of a partition of the set of classes. Later, the performance of the combination of these models is evaluated on a separate test set. Unlike current OOD detection techniques, this framework does not require auxiliary OOD datasets and does not separate classification from detection performance. Furthermore, we present a new strong baseline for more consistent predictive confidence in deep models, called fitted ensembles, where overconfident predictions are rectified by transformed versions of the original classification task. Fitted ensembles can naturally detect OOD examples without requiring auxiliary data by observing contradicting predictions among its components. Experiments on MNIST, SVHN, CIFAR-10/100, and ImageNet show fitted ensemble significantly outperform conventional ensembles on OOD examples and are possible to scale.

【3】 Test-time Collective Prediction 标题：测试时间集合预测

作者：Celestine Mendler-Dünner,Wenshuo Guo,Stephen Bates,Michael I. Jordan 机构：University of California, Berkeley 链接：https://arxiv.org/abs/2106.12012 摘要：机器学习中一个越来越常见的设置涉及多方，每一方都有自己的数据，他们希望共同预测未来的测试点。代理希望从全套代理的集体专业知识中获益，以便做出比单个代理更好的预测，但可能不愿意公布其数据或模型参数。在这项工作中，我们探索了一种分散的机制来在测试时进行集体预测，利用每个代理预先训练的模型，而不依赖于外部验证、模型再训练或数据池。我们的方法从社会科学文献中获得了关于人类共识形成的启示。我们从理论上分析了我们的机制，结果表明它在大样本限制下收敛于逆均方误差加权。为了计算集体预测的误差线，我们提出了一个分散的Jackknife过程，评估我们的机制对单个代理预测的敏感性。在经验上，我们证明了我们的方案有效地结合了输入空间中不同质量的模型。所提出的一致性预测比经典的模型平均法获得了显著的收益，甚至优于可以获得额外验证数据的加权平均法。摘要：An increasingly common setting in machine learning involves multiple parties, each with their own data, who want to jointly make predictions on future test points. Agents wish to benefit from the collective expertise of the full set of agents to make better predictions than they would individually, but may not be willing to release their data or model parameters. In this work, we explore a decentralized mechanism to make collective predictions at test time, leveraging each agent's pre-trained model without relying on external validation, model retraining, or data pooling. Our approach takes inspiration from the literature in social science on human consensus-making. We analyze our mechanism theoretically, showing that it converges to inverse meansquared-error (MSE) weighting in the large-sample limit. To compute error bars on the collective predictions we propose a decentralized Jackknife procedure that evaluates the sensitivity of our mechanism to a single agent's prediction. Empirically, we demonstrate that our scheme effectively combines models with differing quality across the input space. The proposed consensus prediction achieves significant gains over classical model averaging, and even outperforms weighted averaging schemes that have access to additional validation data.

【4】 groupShapley: Efficient prediction explanation with Shapley values for feature groups 标题：groupShapley：特征组Shapley值的有效预测解释

作者：Martin Jullum,Annabelle Redelmeier,Kjersti Aas 机构：Date, Feb ,th, arXiv:,.,v, [stat.ML] , Jun 链接：https://arxiv.org/abs/2106.12228 摘要：Shapley价值观已经成为解释复杂机器学习模型预测的最合适和理论上最合理的框架之一。Shapley值在解释环境中的流行可能是由于其独特的理论性质。然而，Shapley值的主要缺点是，它的计算复杂度随着输入特征的数量呈指数增长，这使得它在可能有成百上千个特征的许多实际情况下是不可行的。此外，对于许多（依赖的）特征，呈现/可视化和解释计算出的Shapley值也变得具有挑战性。本文介绍了groupShapley：一种处理上述瓶颈的概念上简单的方法。其思想是将特征分组，例如按类型或相关性，然后计算并呈现这些组的Shapley值，而不是所有单个特征的Shapley值。将成百上千个特征减少到六个左右，使得精确计算切实可行，表示和知识提取大大简化。我们证明了在一定的条件下，groupShapley等价于每个特征组中特征值的求和。此外，我们提供了一个模拟研究，举例说明这些条件不满足时的差异。我们在一个真实的汽车保险示例中说明了该方法的可用性，其中groupShapley用于提供简单直观的解释。摘要：Shapley values has established itself as one of the most appropriate and theoretically sound frameworks for explaining predictions from complex machine learning models. The popularity of Shapley values in the explanation setting is probably due to its unique theoretical properties. The main drawback with Shapley values, however, is that its computational complexity grows exponentially in the number of input features, making it unfeasible in many real world situations where there could be hundreds or thousands of features. Furthermore, with many (dependent) features, presenting/visualizing and interpreting the computed Shapley values also becomes challenging. The present paper introduces groupShapley: a conceptually simple approach for dealing with the aforementioned bottlenecks. The idea is to group the features, for example by type or dependence, and then compute and present Shapley values for these groups instead of for all individual features. Reducing hundreds or thousands of features to half a dozen or so, makes precise computations practically feasible and the presentation and knowledge extraction greatly simplified. We prove that under certain conditions, groupShapley is equivalent to summing the feature-wise Shapley values within each feature group. Moreover, we provide a simulation study exemplifying the differences when these conditions are not met. We illustrate the usability of the approach in a real world car insurance example, where groupShapley is used to provide simple and intuitive explanations.

【5】 Deep learning for improved global precipitation in numerical weather prediction systems 标题：数值天气预报系统中改进全球降水的深度学习

作者：Manmeet Singh,Bipin Kumar,Dev Niyogi,Suryachandra Rao,Sukhpal Singh Gill,Rajib Chattopadhyay,Ravi S Nanjundiah 机构： Indian Institute of Tropical Meteorology, Ministry of Earth Sciences, India, Jackson School of Geosciences, University of Texas at Austin, USA, School of Electrical Engineering and Computer Science, Queen Mary University of London, UK 链接：https://arxiv.org/abs/2106.12045 摘要：在最先进的天气和气候模型中，降水的形成是一个重要的过程。了解其与其他变量的关系可以带来无穷的好处，特别是对于依靠降雨维持生计的世界季风地区。各种因素在降雨的形成中起着至关重要的作用，而这些物理过程正导致业务天气预报的显著偏差。我们使用具有残差学习的深度卷积神经网络的UNET结构作为概念证明来学习全球数据驱动的降水模型。模型在投影到立方球面投影上的再分析数据集上训练，以最小化由于球面畸变引起的误差。结果与印度气象部门使用的业务动态模式进行了比较。基于深度学习的理论模型显示网格点加倍，以及面积平均技能（以皮尔逊相关系数衡量）相对于操作系统。这项研究是一个概念证明，基于残差学习的UNET可以揭示与目标降水量之间的物理关系，这些物理约束可以用于改进降水预报的动态业务模式。我们的结果为未来在线混合模型的发展铺平了道路。摘要：The formation of precipitation in state-of-the-art weather and climate models is an important process. The understanding of its relationship with other variables can lead to endless benefits, particularly for the world's monsoon regions dependent on rainfall as a support for livelihood. Various factors play a crucial role in the formation of rainfall, and those physical processes are leading to significant biases in the operational weather forecasts. We use the UNET architecture of a deep convolutional neural network with residual learning as a proof of concept to learn global data-driven models of precipitation. The models are trained on reanalysis datasets projected on the cubed-sphere projection to minimize errors due to spherical distortion. The results are compared with the operational dynamical model used by the India Meteorological Department. The theoretical deep learning-based model shows doubling of the grid point, as well as area averaged skill measured in Pearson correlation coefficients relative to operational system. This study is a proof-of-concept showing that residual learning-based UNET can unravel physical relationships to target precipitation, and those physical constraints can be used in the dynamical operational models towards improved precipitation forecasts. Our results pave the way for the development of online, hybrid models in the future.

其他神经网络|深度学习|模型|建模(21篇)

【1】 Weisfeiler and Lehman Go Cellular: CW Networks 标题：Weisfeiler和Lehman Go Ccell：CW Networks

作者：Cristian Bodnar,Fabrizio Frasca,Nina Otter,Yu Guang Wang,Pietro Liò,Guido Montúfar,Michael Bronstein 机构：University of Cambridge, Imperial College London & Twitter, MPI-MIS, SJTU & UNSW, MPI-MIS & UCLA 备注：28 pages, 7 figures 链接：https://arxiv.org/abs/2106.12575 摘要：图形神经网络（GNNs）的表达能力有限，难以处理长程的相互作用，缺乏建立高阶结构模型的原则性方法。这些问题可以归结为计算图和输入图结构之间的强耦合。最近提出的消息传递单纯形网络通过在图的团复形上执行消息传递，自然地将这些元素解耦。然而，这些模型受到单纯形复形刚性组合结构的严格限制。在这项工作中，我们将最近关于SCs的理论结果推广到规则细胞复合体、灵活地包含SCs和图的拓扑对象。我们证明，这种泛化提供了一组强大的图“提升”变换，每一个变换都导致一个独特的层次消息传递过程。由此产生的方法，我们统称为CW网络（CWN），严格来说比WL测试更强大，在某些情况下，不比3-WL测试弱。特别是，我们证明了一个这样的方案，基于环，当应用于分子图问题的有效性。所提出的架构得益于可证明的比常用GNNs更大的表达能力、高阶信号的原则性建模以及节点间距离的压缩。我们证明了我们的模型在各种分子数据集上取得了最新的结果。摘要：Graph Neural Networks (GNNs) are limited in their expressive power, struggle with long-range interactions and lack a principled way to model higher-order structures. These problems can be attributed to the strong coupling between the computational graph and the input graph structure. The recently proposed Message Passing Simplicial Networks naturally decouple these elements by performing message passing on the clique complex of the graph. Nevertheless, these models are severely constrained by the rigid combinatorial structure of Simplicial Complexes (SCs). In this work, we extend recent theoretical results on SCs to regular Cell Complexes, topological objects that flexibly subsume SCs and graphs. We show that this generalisation provides a powerful set of graph ``lifting'' transformations, each leading to a unique hierarchical message passing procedure. The resulting methods, which we collectively call CW Networks (CWNs), are strictly more powerful than the WL test and, in certain cases, not less powerful than the 3-WL test. In particular, we demonstrate the effectiveness of one such scheme, based on rings, when applied to molecular graph problems. The proposed architecture benefits from provably larger expressivity than commonly used GNNs, principled modelling of higher-order signals and from compressing the distances between nodes. We demonstrate that our model achieves state-of-the-art results on a variety of molecular datasets.

【2】 Gradient-Based Interpretability Methods and Binarized Neural Networks 标题：基于梯度的可解释性方法与二值化神经网络

作者：Amy Widdicombe,Simon J. Julier 机构： 1Department of Computer Science, University College London 备注：Accepted at the ICML 2021 Workshop on Theoretic Foundation, Criticism & Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.12569 摘要：二值化神经网络（BNNs）有可能彻底改变在边缘计算平台上进行深度学习的方式。然而，这些网络上的可解释性方法的有效性尚未得到评估。本文比较了几种常用的基于显著图的可解释性技术（梯度、平滑梯度和梯度凸轮）在二值化或全精度神经网络（FPNNs）中的性能。我们发现，基本梯度法产生非常相似的寻找地图为这两种类型的网络。然而，SmoothGrad为BNNs生成了明显的噪声贴图。GradCAM还生成了不同网络类型的显著性图，其中一些bnn的解释似乎毫无意义。我们评论了这些解释上的差异的可能原因，并以解释性技术为什么应该在更广泛的网络类型上进行测试为例。摘要：Binarized Neural Networks (BNNs) have the potential to revolutionize the way that deep learning is carried out in edge computing platforms. However, the effectiveness of interpretability methods on these networks has not been assessed. In this paper, we compare the performance of several widely used saliency map-based interpretabilty techniques (Gradient, SmoothGrad and GradCAM), when applied to Binarized or Full Precision Neural Networks (FPNNs). We found that the basic Gradient method produces very similar-looking maps for both types of network. However, SmoothGrad produces significantly noisier maps for BNNs. GradCAM also produces saliency maps which differ between network types, with some of the BNNs having seemingly nonsensical explanations. We comment on possible reasons for these differences in explanations and present it as an example of why interpretability techniques should be tested on a wider range of network types.

【3】 Feature Alignment for Approximated Reversibility in Neural Networks 标题：神经网络中近似可逆性的特征对齐

作者：Tiago de Souza Farias,Jonas Maziero 机构：Departamento de F´ısica, Centro de Ciˆencias Naturais e Exatas, Universidade Federal de Santa Maria, Avenida Roraima , Santa Maria, Rio Grande do Sul,-, Brazil 备注：21 pages 链接：https://arxiv.org/abs/2106.12562 摘要：我们介绍了特征对齐，一种在人工神经网络中获得近似可逆性的技术。通过特征提取，我们可以训练一个神经网络来学习从输出到输入的反向过程的估计映射。结合变分自动编码器，我们可以从与训练数据相同的统计量中产生新的样本。利用生成对抗网络中的概念，对结果进行了改进。最后，我们证明了该技术可以进行局部训练，节省计算内存资源。应用这些技术，我们报告了三个视觉生成任务的结果：MNIST、CIFAR-10和celebA。摘要：We introduce feature alignment, a technique for obtaining approximate reversibility in artificial neural networks. By means of feature extraction, we can train a neural network to learn an estimated map for its reverse process from outputs to inputs. Combined with variational autoencoders, we can generate new samples from the same statistics as the training data. Improvements of the results are obtained by using concepts from generative adversarial networks. Finally, we show that the technique can be modified for training neural networks locally, saving computational memory resources. Applying these techniques, we report results for three vision generative tasks: MNIST, CIFAR-10, and celebA.

【4】 Real-time Outdoor Localization Using Radio Maps: A Deep Learning Approach 标题：利用无线电地图进行户外实时定位：一种深度学习方法

作者：Çağkan Yapar,Ron Levie,Gitta Kutyniok,Giuseppe Caire 机构：‡ Institute of Telecommunication Systems, TU Berlin, † Department of Mathematics, LMU M¨unchen, SDepartment of Physics and Technology, University of Tromsø 链接：https://arxiv.org/abs/2106.12556 摘要：本文研究密集城市环境下蜂窝网络的定位问题。全球导航卫星系统在城市环境中的性能通常较差，在城市环境中，设备和卫星之间的视线条件的可能性较低，因此需要替代定位方法以获得较高的精度。我们提出了一种基于路径丢失的深度定位学习方法，与依赖到达时间或到达角信息的方法不同，该方法不需要增加用户设备的计算复杂度。在无线网络中，用户设备扫描基站信标时隙并识别为数不多的最强基站信号以用于切换和用户-基站关联目的。在所提出的方法中，要本地化的用户简单地向位于云中的中央处理单元报告这样的接收信号强度。对于每个基站，我们在地图的密集网格中的每个位置都有很好的路径损失近似值。这个近似值是由RadioUNet提供的，RadioUNet是一个基于深度学习的模拟城市环境中的路径损失函数的模拟器，我们之前已经提出并出版了这个模拟器。利用估计出的所有基站的路径损失无线地图和相应的报告信号强度，提出的深度学习算法可以提取出非常准确的用户定位。所提出的方法，称为蝗虫网，具有很高的鲁棒性，不准确的估计无线电地图。我们通过数值实验证明了这一点，得到了最新的结果。摘要：This paper deals with the problem of localization in a cellular network in a dense urban scenario. Global Navigation Satellite Systems typically perform poorly in urban environments, where the likelihood of line-of-sight conditions between the devices and the satellites is low, and thus alternative localization methods are required for good accuracy. We present a deep learning method for localization, based merely on pathloss, which does not require any increase in computation complexity at the user devices with respect to the device standard operations, unlike methods that rely on time of arrival or angle of arrival information. In a wireless network, user devices scan the base station beacon slots and identify the few strongest base station signals for handover and user-base station association purposes. In the proposed method, the user to be localized simply reports such received signal strengths to a central processing unit, which may be located in the cloud. For each base station we have good approximation of the pathloss at every location in a dense grid in the map. This approximation is provided by RadioUNet, a deep learning-based simulator of pathloss functions in urban environment, that we have previously proposed and published. Using the estimated pathloss radio maps of all base stations and the corresponding reported signal strengths, the proposed deep learning algorithm can extract a very accurate localization of the user. The proposed method, called LocUNet, enjoys high robustness to inaccuracies in the estimated radio maps. We demonstrate this by numerical experiments, which obtain state-of-the-art results.

【5】 Learning Stochastic Majority Votes by Minimizing a PAC-Bayes Generalization Bound 标题：最小化PAC-Bayes泛化界学习随机多数票

作者：Valentina Zantedeschi,Paul Viallard,Emilie Morvant,Rémi Emonet,Amaury Habrard,Pascal Germain,Benjamin Guedj 机构：Inria, Lille - Nord Europe, Inria-London - The Inria London Programme, University College London, Centre for Artificial Intelligence, Univ Lyon, UJM-Saint-Etienne, CNRS, Institut d Optique Graduate School, Laboratoire Hubert Curien UMR , F-, SAINT-ETIENNE, France 链接：https://arxiv.org/abs/2106.12535 摘要：我们研究了有限分类器集合上多数票的随机对应，并研究了它的泛化性质。虽然我们的方法适用于任意分布，但我们用Dirichlet分布来实例化它：这允许期望风险的一个封闭形式和可微表达式，然后将泛化界转化为一个可处理的训练目标。由此产生的随机多数投票学习算法达到了最先进的精度，并受益于（非真空）严格的泛化边界，在一系列的数值实验中，当与同样最小化PAC贝叶斯目标的竞争算法进行比较时——既有不知情的（数据独立的）先验，也有知情的（数据依赖的）先验。摘要：We investigate a stochastic counterpart of majority votes over finite ensembles of classifiers, and study its generalization properties. While our approach holds for arbitrary distributions, we instantiate it with Dirichlet distributions: this allows for a closed-form and differentiable expression for the expected risk, which then turns the generalization bound into a tractable training objective. The resulting stochastic majority vote learning algorithm achieves state-of-the-art accuracy and benefits from (non-vacuous) tight generalization bounds, in a series of numerical experiments when compared to competing algorithms which also minimize PAC-Bayes objectives -- both with uninformed (data-independent) and informed (data-dependent) priors.

【6】 Coarse-to-Fine Q-attention: Efficient Learning for Visual Robotic Manipulation via Discretisation 标题：从粗到精的Q-注意：基于离散化的视觉机器人操作的有效学习

作者：Stephen James,Kentaro Wada,Tristan Laidlow,Andrew J. Davison 机构：Dyson Robotics Lab, Imperial College London 备注：Videos and code found at this https URL 链接：https://arxiv.org/abs/2106.12534 摘要：回顾过去几年，深度强化学习（RL）的最大突破是在离散动作领域。然而，机器人操作本身就是一个连续的控制环境，但是这些连续的控制强化学习算法往往依赖于演员-评论家方法，由于演员和评论家的联合优化，这些方法效率低，训练难度大。为此，我们探讨了如何将离散动作RL算法的稳定性引入机器人操作领域。我们扩展了最近发布的ARM算法，将连续的次优姿态代理替换为离散的次优姿态代理。考虑到旋转的有界性，旋转的离散化是微不足道的，而平移本质上是无界的，这使得离散化很困难。通过对三维空间的离散化，将平移预测转化为体素预测问题；然而，大型工作空间的体素化是内存密集型的，并且不会与高密度的体素一起工作，这对于获得机器人操作所需的分辨率至关重要。因此，我们建议通过逐渐提高分辨率，从粗到细地应用这种体素预测。在每一步中，我们提取最高值的体素作为预测位置，然后作为下一步高分辨率体素化的中心。这种从粗到精的预测应用于多个步骤，给出了几乎无损的翻译预测。结果表明，与连续控制算法相比，本文提出的由粗到精算法能更有效地完成RLBench任务，甚至能在不到7分钟的时间内训练出一些实际任务，即表格rasa，只需3次演示。此外，我们还表明，通过移动到体素表示，我们能够很容易地合并来自多个摄像头的观察。摘要：Reflecting on the last few years, the biggest breakthroughs in deep reinforcement learning (RL) have been in the discrete action domain. Robotic manipulation, however, is inherently a continuous control environment, but these continuous control reinforcement learning algorithms often depend on actor-critic methods that are sample-inefficient and inherently difficult to train, due to the joint optimisation of the actor and critic. To that end, we explore how we can bring the stability of discrete action RL algorithms to the robot manipulation domain. We extend the recently released ARM algorithm, by replacing the continuous next-best pose agent with a discrete next-best pose agent. Discretisation of rotation is trivial given its bounded nature, while translation is inherently unbounded, making discretisation difficult. We formulate the translation prediction as the voxel prediction problem by discretising the 3D space; however, voxelisation of a large workspace is memory intensive and would not work with a high density of voxels, crucial to obtaining the resolution needed for robotic manipulation. We therefore propose to apply this voxel prediction in a coarse-to-fine manner by gradually increasing the resolution. In each step, we extract the highest valued voxel as the predicted location, which is then used as the centre of the higher-resolution voxelisation in the next step. This coarse-to-fine prediction is applied over several steps, giving a near-lossless prediction of the translation. We show that our new coarse-to-fine algorithm is able to accomplish RLBench tasks much more efficiently than the continuous control equivalent, and even train some real-world tasks, tabular rasa, in less than 7 minutes, with only 3 demonstrations. Moreover, we show that by moving to a voxel representation, we are able to easily incorporate observations from multiple cameras.

【7】 Bayesian Deep Learning Hyperparameter Search for Robust Function Mapping to Polynomials with Noise 标题：贝叶斯深度学习超参数搜索在含噪声多项式稳健函数映射中的应用

作者：Nidhin Harilal,Udit Bhatia,Auroop R. Ganguly 机构：Indian Institute of Technology Gandhinagar, Gujarat, India, Northeastern University, Boston, MA, USA, Pacific Northwest National Laboratory, Richland, WA, USA 链接：https://arxiv.org/abs/2106.12532 摘要：最近的文献报道了神经结构搜索的进展，以及连接结构的可解释性和可解释性。然而，我们对于如何设计贝叶斯深度学习（BDL）超参数，特别是深度、宽度和集合大小的理解，对于具有不确定性量化的鲁棒函数映射，仍然是新兴的。本文试图通过将贝叶斯连接表示映射到具有不同噪声类型和比率的不同阶多项式来加深我们的理解。我们研究噪声污染多项式来寻找超参数的组合，这些超参数可以提取出潜在的多项式信号，同时基于噪声属性量化不确定性。具体地说，我们试图研究这样一个问题：可以找到一个合适的神经结构和集合配置来检测任何n阶多项式的信号，该信号被具有不同分布和信噪比以及不同噪声属性的噪声污染。我们的结果表明，可能存在一个最佳的网络深度以及预测技巧和不确定性量化的最佳集合数，分别。然而，宽度的最优性是不可辨别的，即使在高宽度值时，性能增益随着宽度的增加而减小。我们的实验和见解对理解BDL表示的理论性质和设计实际的解决方案具有指导意义。摘要：Advances in neural architecture search, as well as explainability and interpretability of connectionist architectures, have been reported in the recent literature. However, our understanding of how to design Bayesian Deep Learning (BDL) hyperparameters, specifically, the depth, width and ensemble size, for robust function mapping with uncertainty quantification, is still emerging. This paper attempts to further our understanding by mapping Bayesian connectionist representations to polynomials of different orders with varying noise types and ratios. We examine the noise-contaminated polynomials to search for the combination of hyperparameters that can extract the underlying polynomial signals while quantifying uncertainties based on the noise attributes. Specifically, we attempt to study the question that an appropriate neural architecture and ensemble configuration can be found to detect a signal of any n-th order polynomial contaminated with noise having different distributions and signal-to-noise (SNR) ratios and varying noise attributes. Our results suggest the possible existence of an optimal network depth as well as an optimal number of ensembles for prediction skills and uncertainty quantification, respectively. However, optimality is not discernible for width, even though the performance gain reduces with increasing width at high values of width. Our experiments and insights can be directional to understand theoretical properties of BDL representations and to design practical solutions.

【8】 Universal Consistency of Deep Convolutional Neural Networks 标题：深卷积神经网络的泛相合性

作者：Shao-Bo Lin,Kaidong Wang,Yao Wang,Ding-Xuan Zhou 机构： School of Management, Xi’an JiaotongUniversity, Zhou is with School of Data Science and Department of Mathematics, City University of Hong Kong 备注：9pages, 4 figures 链接：https://arxiv.org/abs/2106.12498 摘要：与实践中大量的深度卷积神经网络研究相比，深度卷积神经网络的理论行为研究相对滞后。特别是，DCNNs的通用一致性仍然是开放的。在本文中，我们证明了在具有扩张卷积（零填充）的DCNNs上实现经验风险最小化是强普遍一致的。基于这种普遍一致性，我们进行了一系列的实验，结果表明，在没有完全连接层的情况下，具有扩展卷积的dcnn的性能并不比广泛使用的包含收缩（无零填充）卷积层和多个完全连接层的混合结构的深层神经网络差。摘要：Compared with avid research activities of deep convolutional neural networks (DCNNs) in practice, the study of theoretical behaviors of DCNNs lags heavily behind. In particular, the universal consistency of DCNNs remains open. In this paper, we prove that implementing empirical risk minimization on DCNNs with expansive convolution (with zero-padding) is strongly universally consistent. Motivated by the universal consistency, we conduct a series of experiments to show that without any fully connected layers, DCNNs with expansive convolution perform not worse than the widely used deep neural networks with hybrid structure containing contracting (without zero-padding) convolution layers and several fully connected layers.

【9】 AC/DC: Alternating Compressed/DeCompressed Training of Deep Neural Networks 标题：AC/DC：深度神经网络的交替压缩/解压缩训练

作者：Alexandra Peste,Eugenia Iofinova,Adrian Vladu,Dan Alistarh 机构： Université de Paris 链接：https://arxiv.org/abs/2106.12379 摘要：随着深度神经网络（DNNs）计算量的不断增加，人们对DNN模型的研究越来越感兴趣。最近的工作研究了更困难的稀疏训练情况，其中DNN权重尽可能地已经稀疏，以减少训练期间的计算成本。现有的稀疏训练方法主要是基于经验的，相对于密集基线，稀疏训练方法的精度往往较低。在本文中，我们提出了一种通用的DNNs的交替压缩/解压缩（AC/DC）训练方法，证明了该算法的收敛性，并证明了AC/DC在计算量相近的情况下，其精度优于现有的稀疏训练方法；在高稀疏度下，AC/DC甚至优于依赖于精确的预训练密集模型的现有方法。AC/DC的一个重要特性是它允许密集和稀疏模型的联合训练，在训练过程结束时产生精确的稀疏-密集模型对。这在实践中是有用的，在资源受限的环境中，压缩变体可能需要部署，而无需重新执行整个训练流程，同时也为我们深入了解密集模型和压缩模型之间的精度差距。摘要：The increasing computational requirements of deep neural networks (DNNs) have led to significant interest in obtaining DNN models that are sparse, yet accurate. Recent work has investigated the even harder case of sparse training, where the DNN weights are, for as much as possible, already sparse to reduce computational costs during training. Existing sparse training methods are mainly empirical and often have lower accuracy relative to the dense baseline. In this paper, we present a general approach called Alternating Compressed/DeCompressed (AC/DC) training of DNNs, demonstrate convergence for a variant of the algorithm, and show that AC/DC outperforms existing sparse training methods in accuracy at similar computational budgets; at high sparsity levels, AC/DC even outperforms existing methods that rely on accurate pre-trained dense models. An important property of AC/DC is that it allows co-training of dense and sparse models, yielding accurate sparse-dense model pairs at the end of the training process. This is useful in practice, where compressed variants may be desirable for deployment in resource-constrained settings without re-doing the entire training flow, and also provides us with insights into the accuracy gap between dense and compressed models.

【10】 Learning Under Delayed Feedback: Implicitly Adapting to Gradient Delays 标题：延迟反馈下的学习：隐式适应梯度延迟

作者：Rotem Zamir Aviv,Ido Hakimi,Assaf Schuster,Kfir Y. Levy 机构：Department of Electrical and Computer Engineering, Technion, Department of Computer Science, Technion, A Viterbi Fellow 备注：to be published in ICML 2021 链接：https://arxiv.org/abs/2106.12261 摘要：我们考虑随机凸优化问题，其中多台机器异步并行运行，同时共享一个公共内存。我们提出了一种鲁棒的训练方法，并得到了不依赖于更新延迟、目标光滑性和梯度方差先验知识的非渐近收敛性保证。相反，用于此设置的现有方法严重依赖于此先验知识，这使得它们不适用于基本上所有共享资源的计算环境，如云和数据中心。具体地说，现有的方法无法适应机器动态分配引起的延迟变化，而我们的方法隐含地适应了这种变化。摘要：We consider stochastic convex optimization problems, where several machines act asynchronously in parallel while sharing a common memory. We propose a robust training method for the constrained setting and derive non asymptotic convergence guarantees that do not depend on prior knowledge of update delays, objective smoothness, and gradient variance. Conversely, existing methods for this setting crucially rely on this prior knowledge, which render them unsuitable for essentially all shared-resources computational environments, such as clouds and data centers. Concretely, existing approaches are unable to accommodate changes in the delays which result from dynamic allocation of the machines, while our method implicitly adapts to such changes.

【11】 A Unified Approach to Fair Online Learning via Blackwell Approachability 标题：通过Blackwell可接近性实现公平在线学习的统一方法

作者：Evgenii Chzhen,Christophe Giraud,Gilles Stoltz 机构：Université Paris-Saclay, CNRS, Laboratoire de mathématiques d’Orsay, Orsay, France 链接：https://arxiv.org/abs/2106.12242 摘要：我们提供了一个设置和一般方法，公平的在线学习随机敏感和非敏感的背景。场景是玩家和自然之间的重复游戏，在每个阶段，双方都根据上下文选择动作。受无意识概念的启发，我们假设玩家在做出决定之前只能访问非敏感上下文，同时我们讨论了自然访问敏感上下文和自然不知道敏感上下文的两种情况。利用Blackwell的可接近性理论处理未知上下文分布的情况，给出了学习目标与公平约束相容的一般充要条件。这一条件在（分组）无遗憾和（分组）校准目标以及作为附加约束的人口均等上被实例化。当目标与约束不相容时，所提供的框架允许描述两者之间的最佳权衡。摘要：We provide a setting and a general approach to fair online learning with stochastic sensitive and non-sensitive contexts. The setting is a repeated game between the Player and Nature, where at each stage both pick actions based on the contexts. Inspired by the notion of unawareness, we assume that the Player can only access the non-sensitive context before making a decision, while we discuss both cases of Nature accessing the sensitive contexts and Nature unaware of the sensitive contexts. Adapting Blackwell's approachability theory to handle the case of an unknown contexts' distribution, we provide a general necessary and sufficient condition for learning objectives to be compatible with some fairness constraints. This condition is instantiated on (group-wise) no-regret and (group-wise) calibration objectives, and on demographic parity as an additional constraint. When the objective is not compatible with the constraint, the provided framework permits to characterise the optimal trade-off between the two.

【12】 Multiband VAE: Latent Space Partitioning for Knowledge Consolidation in Continual Learning 标题：多频带VAE：持续学习中知识巩固的潜在空间划分

作者：Kamil Deja,Paweł Wawrzyński,Daniel Marczak,Wojciech Masarczyk,Tomasz Trzciński 机构：Warsaw University of Technology ,Tooploox 链接：https://arxiv.org/abs/2106.12196 摘要：提出了一种新的基于变分自编码潜在空间划分的生成模型无监督连续知识整合方法。获取有关新数据样本的知识而不忘记以前的数据样本是持续学习的一个关键问题。目前提出的方法通过扩展现有的模型来实现这一目标，同时限制其行为在过去的数据上不会退化，这并没有充分利用整个训练数据集中关系的潜力。在这项工作中，我们确定了这一局限性，并将持续学习的目标定位为知识积累任务。我们通过不断地重新排列潜在空间分区来解决这个问题，我们称之为带，带是在不同任务中看到的样本的表示，由它们所包含的信息的相似性驱动。此外，我们还介绍了一种简单而有效的方法来控制遗忘过去的数据，以提高重建质量编码在潜在波段和潜在空间解纠缠技术，提高知识巩固。在标准的持续学习评估基准之上，我们在一个新的知识整合场景中对我们的方法进行了评估，结果表明，所提出的方法在所有测试场景中的性能都比最新的方法高出两倍。摘要：We propose a new method for unsupervised continual knowledge consolidation in generative models that relies on the partitioning of Variational Autoencoder's latent space. Acquiring knowledge about new data samples without forgetting previous ones is a critical problem of continual learning. Currently proposed methods achieve this goal by extending the existing model while constraining its behavior not to degrade on the past data, which does not exploit the full potential of relations within the entire training dataset. In this work, we identify this limitation and posit the goal of continual learning as a knowledge accumulation task. We solve it by continuously re-aligning latent space partitions that we call bands which are representations of samples seen in different tasks, driven by the similarity of the information they contain. In addition, we introduce a simple yet effective method for controlled forgetting of past data that improves the quality of reconstructions encoded in latent bands and a latent space disentanglement technique that improves knowledge consolidation. On top of the standard continual learning evaluation benchmarks, we evaluate our method on a new knowledge consolidation scenario and show that the proposed approach outperforms state-of-the-art by up to twofold across all testing scenarios.

【13】 Combination of Convolutional Neural Network and Gated Recurrent Unit for Energy Aware Resource Allocation 标题：卷积神经网络与门控递归单元相结合的节能资源分配

作者：Zeinab Khodaverdian,Hossein Sadr,Seyed Ahmad Edalatpanah,Mojdeh Nazari Solimandarabi 机构：Department of Computer Engineering, Ayandegan Institute of Higher Education, Tonekabon, Iran, Department of Computer Engineering, Rasht Branch, Islamic Azad University, Rasht, Iran †, Department of Applied Mathematics, Ayandegan Institute of Higher Education 链接：https://arxiv.org/abs/2106.12178 摘要：云计算服务模式经历了快速增长，低效率的资源使用被认为是云数据中心高能耗的最大原因之一。云数据中心中的资源分配旨在降低能耗，它使用虚拟机（vm）的实时迁移及其整合到少量物理机（pm）中。然而，选择合适的VM进行迁移是一个重要的挑战。为了解决这个问题，可以根据用户请求的模式将vm分为对延迟敏感或不敏感的类，然后选择合适的vm进行迁移。利用卷积神经网络（CNN）和选通递归单元（GRU）相结合的方法对microsoftazure数据集中的vm进行分类。由于此数据集中的大多数虚拟机都被标记为对延迟不敏感，因此迁移此组中的更多虚拟机不仅可以减少能耗，而且还可以减少违反服务级别协议（SLA）的情况。基于实证结果，该模型的预测精度为95.18%，与现有模型相比，具有明显的优越性。摘要：Cloud computing service models have experienced rapid growth and inefficient resource usage is known as one of the greatest causes of high energy consumption in cloud data centers. Resource allocation in cloud data centers aiming to reduce energy consumption has been conducted using live migration of Virtual Machines (VMs) and their consolidation into the small number of Physical Machines (PMs). However, the selection of the appropriate VM for migration is an important challenge. To solve this issue, VMs can be classified according to the pattern of user requests into sensitive or insensitive classes to latency, and thereafter suitable VMs can be selected for migration. In this paper, the combination of Convolution Neural Network (CNN) and Gated Recurrent Unit (GRU) is utilized for the classification of VMs in the Microsoft Azure dataset. Due to the fact the majority of VMs in this dataset are labeled as insensitive to latency, migration of more VMs in this group not only reduces energy consumption but also decreases the violation of Service Level Agreements (SLA). Based on the empirical results, the proposed model obtained an accuracy of 95.18which clearly demonstrates the superiority of our proposed model compared to other existing models.

【14】 Imitation Learning: Progress, Taxonomies and Opportunities 标题：模仿学习：进展、分类与机遇

作者：Boyuan Zheng,Sunny Verma,Jianlong Zhou,Ivor Tsang,Fang Chen 机构：versity of Technology Sydney, Autralia 链接：https://arxiv.org/abs/2106.12177 摘要：模仿学习的目的是从人类专家的演示或人工创建的代理中提取知识，以复制他们的行为。它的成功已经在视频游戏、自动驾驶、机器人模拟和物体操纵等领域得到了证明。然而，这种复制过程可能会有问题，例如性能高度依赖于演示质量，并且大多数经过训练的代理在特定任务环境中的性能有限。在这项调查中，我们对模仿学习进行了系统的回顾。我们首先介绍了发展历史和预备阶段的背景知识，然后介绍了模仿学习中的不同分类法和该领域的关键里程碑。然后，我们详细讨论了学习策略的挑战，并从次优演示、语音指令和其他相关优化方案中提出了学习策略的研究机会。摘要：Imitation learning aims to extract knowledge from human experts' demonstrations or artificially created agents in order to replicate their behaviors. Its success has been demonstrated in areas such as video games, autonomous driving, robotic simulations and object manipulation. However, this replicating process could be problematic, such as the performance is highly dependent on the demonstration quality, and most trained agents are limited to perform well in task-specific environments. In this survey, we provide a systematic review on imitation learning. We first introduce the background knowledge from development history and preliminaries, followed by presenting different taxonomies within Imitation Learning and key milestones of the field. We then detail challenges in learning strategies and present research opportunities with learning policy from suboptimal demonstration, voice instructions and other associated optimization schemes.

【15】 Lagrangian dual framework for conservative neural network solutions of kinetic equations 标题：动力学方程保守神经网络解的拉格朗日对偶框架

作者：Hyung Ju Hwang,Hwijae Son 机构：Department of Mathematics, Pohang University of Science and Technology, Pohang, Republic of Korea, Stochastic Analysis and Application Research Center, Korea Advanced Institute of Science and Technology, Daejeon, Republic of Korea 链接：https://arxiv.org/abs/2106.12147 摘要：在本文中，我们提出了一个新的保守公式求解动力学方程通过神经网络。更准确地说，我们将学习问题描述为一个约束优化问题，约束表示物理守恒定律。通过拉格朗日对偶，对剩余损失函数放松了约束。通过将解的物理守恒性质作为学习问题的约束条件，我们证明了动力学Fokker-Planck方程和齐次Boltzmann方程的解在误差和守恒定律方面的更精确的近似。摘要：In this paper, we propose a novel conservative formulation for solving kinetic equations via neural networks. More precisely, we formulate the learning problem as a constrained optimization problem with constraints that represent the physical conservation laws. The constraints are relaxed toward the residual loss function by the Lagrangian duality. By imposing physical conservation properties of the solution as constraints of the learning problem, we demonstrate far more accurate approximations of the solutions in terms of errors and the conservation laws, for the kinetic Fokker-Planck equation and the homogeneous Boltzmann equation.

【16】 IQ-Learn: Inverse soft-Q Learning for Imitation 标题：IQ-Learning：用于模仿的逆软Q学习

作者：Divyansh Garg,Shuvam Chakraborty,Chris Cundy,Jiaming Song,Stefano Ermon 机构：Stanford University 链接：https://arxiv.org/abs/2106.12142 摘要：在许多顺序决策问题（如机器人控制、游戏、顺序预测）中，人类或专家数据包含了关于任务的有用信息。然而，在具有复杂动力学的高维环境中，从少量专家数据中进行模仿学习是一项具有挑战性的工作。行为克隆是一种简单的方法，由于其简单的实现和稳定的收敛性而被广泛应用，但它不利用任何涉及环境动态的信息。现有的许多利用动态信息的方法在实际应用中很难训练，这是由于对报酬和策略逼近器或有偏的高方差梯度估计的对抗性优化过程。我们提出了一种动态感知的IL方法，它通过学习一个Q函数来避免对抗性训练，隐式地表示奖励和策略。在标准测试中，隐式学习的奖赏与真实奖赏呈高度正相关，说明我们的方法也可用于逆强化学习。我们的方法，逆软Q学习（IQ-Learn）在离线和在线模拟学习环境中获得了最先进的结果，在所需的环境交互数量和在高维空间中的可伸缩性方面都超过了现有的方法。摘要：In many sequential decision-making problems (e.g., robotics control, game playing, sequential prediction), human or expert data is available containing useful information about the task. However, imitation learning (IL) from a small amount of expert data can be challenging in high-dimensional environments with complex dynamics. Behavioral cloning is a simple method that is widely used due to its simplicity of implementation and stable convergence but doesn't utilize any information involving the environment's dynamics. Many existing methods that exploit dynamics information are difficult to train in practice due to an adversarial optimization process over reward and policy approximators or biased, high variance gradient estimators. We introduce a method for dynamics-aware IL which avoids adversarial training by learning a single Q-function, implicitly representing both reward and policy. On standard benchmarks, the implicitly learned rewards show a high positive correlation with the ground-truth rewards, illustrating our method can also be used for inverse reinforcement learning (IRL). Our method, Inverse soft-Q learning (IQ-Learn) obtains state-of-the-art results in offline and online imitation learning settings, surpassing existing methods both in the number of required environment interactions and scalability in high-dimensional spaces.

【17】 Learning Identity-Preserving Transformations on Data Manifolds 标题：学习数据流形上的保恒等变换

作者：Marissa Connor,Kion Fallah,Christopher Rozell 机构：School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 链接：https://arxiv.org/abs/2106.12096 摘要：许多机器学习技术在模型中加入了保持身份的变换，将其性能推广到以前看不到的数据中。这些变换通常是从一组已知的函数中选择的，这些函数在应用时可以保持输入的一致性（例如，旋转、平移、翻转和缩放）。然而，有许多自然变化不能被标记以供监督或通过检查数据来定义。正如流形假说所建议的，许多这些自然变化存在于或接近低维非线性流形上。有几种技术通过一组学习的李群算子来表示流形变化，李群算子定义流形上的运动方向。然而，这些方法是有限的，因为它们在训练模型时需要变换标签，并且它们缺乏一种方法来确定流形的哪些区域适合应用每个特定的操作符。我们通过引入一种不需要变换标签的学习策略来解决这些限制，并开发了一种方法来学习每个操作符可能被使用的局部区域，同时保持输入的身份。在MNIST和Fashion-MNIST上的实验突出了我们的模型在多类数据集上学习身份保持转换的能力。此外，我们在CelebA上进行训练，以展示我们的模型以无监督的方式学习复杂数据集上语义上有意义的转换的能力。摘要：Many machine learning techniques incorporate identity-preserving transformations into their models to generalize their performance to previously unseen data. These transformations are typically selected from a set of functions that are known to maintain the identity of an input when applied (e.g., rotation, translation, flipping, and scaling). However, there are many natural variations that cannot be labeled for supervision or defined through examination of the data. As suggested by the manifold hypothesis, many of these natural variations live on or near a low-dimensional, nonlinear manifold. Several techniques represent manifold variations through a set of learned Lie group operators that define directions of motion on the manifold. However theses approaches are limited because they require transformation labels when training their models and they lack a method for determining which regions of the manifold are appropriate for applying each specific operator. We address these limitations by introducing a learning strategy that does not require transformation labels and developing a method that learns the local regions where each operator is likely to be used while preserving the identity of inputs. Experiments on MNIST and Fashion MNIST highlight our model's ability to learn identity-preserving transformations on multi-class datasets. Additionally, we train on CelebA to showcase our model's ability to learn semantically meaningful transformations on complex datasets in an unsupervised manner.

【18】 BFTrainer: Low-Cost Training of Neural Networks on Unfillable Supercomputer Nodes 标题：BFTrainer：在不可填满的超级计算机节点上进行神经网络的低成本训练

作者：Zhengchun Liu,Rajkumar Kettimuthu,Michael E. Papka,Ian Foster 机构：Argonne National Laboratory, Northern Illinois University, The University of Chicago 链接：https://arxiv.org/abs/2106.12091 摘要：基于FCFS的超级计算机调度策略会导致许多瞬时空闲节点，这种现象只能通过回填调度方法得到部分缓解，这种方法可以促使小作业先于大作业运行。在这里，我们描述如何实现一个新的用途，否则浪费资源，即深层神经网络（DNN）训练。这个重要的工作负载很容易组织成许多小片段，这些片段可以动态配置以适应超级计算机时间表中的任何节点*时间漏洞。我们描述了如何将重新调整合适的DNN训练任务以适应动态变化的洞的任务描述为一个基于确定性混合整数线性规划（MILP）的资源分配算法，并证明了这个MILP问题在运行时是有效的。我们将进一步说明如何将这个MILP问题调整为针对管理员或用户定义的度量进行优化。我们用超级计算机调度程序日志和不同的DNN训练场景验证了我们的方法，与在专用节点上运行相同的训练任务相比，效率高达93%。因此，我们的方法可以将大量的超级计算机资源分配给DNN训练，而不会对其他应用程序产生影响。摘要：Supercomputer FCFS-based scheduling policies result in many transient idle nodes, a phenomenon that is only partially alleviated by backfill scheduling methods that promote small jobs to run before large jobs. Here we describe how to realize a novel use for these otherwise wasted resources, namely, deep neural network (DNN) training. This important workload is easily organized as many small fragments that can be configured dynamically to fit essentially any node*time hole in a supercomputer's schedule. We describe how the task of rescaling suitable DNN training tasks to fit dynamically changing holes can be formulated as a deterministic mixed integer linear programming (MILP)-based resource allocation algorithm, and show that this MILP problem can be solved efficiently at run time. We show further how this MILP problem can be adapted to optimize for administrator- or user-defined metrics. We validate our method with supercomputer scheduler logs and different DNN training scenarios, and demonstrate efficiencies of up to 93% compared with running the same training tasks on dedicated nodes. Our method thus enables substantial supercomputer resources to be allocated to DNN training with no impact on other applications.

【19】 Q-Learning Lagrange Policies for Multi-Action Restless Bandits 标题：多行动无休止土匪的Q-学习Lagrange策略

作者：Jackson A. Killian,Arpita Biswas,Sanket Shah,Milind Tambe 机构：Harvard University, Cambridge, MA, USA 备注：13 pages, 6 figures, to be published in Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data 链接：https://arxiv.org/abs/2106.12024 摘要：多行动不安多武装土匪（RMAB）是一个强大的资源分配框架，其中$N$独立进程被管理。然而，以往的工作只研究离线设置的问题动力学是已知的。针对这一限制性假设，我们设计了第一个基于拉格朗日松弛和Q-学习相结合的多动作rmab在线学习策略的算法。我们的第一种方法，MAIQL，将二元动作rmab中Whittle索引的Q-学习方法扩展到多动作设置。我们导出了一个广义更新规则和收敛性证明，并证明在标准假设下，MAIQL收敛到渐近最优的多作用RMAB策略$t\rightarrow{}\infty$。然而，MAIQL依赖于在两个时间尺度上学习Q函数和索引，这导致收敛速度慢，并且要求问题结构能够很好地执行。因此，我们设计了第二种算法LPQL，它通过Q-学习的一种变体来最小化Lagrange界，从而学习多动作RMABs的性能良好且更通用的Lagrange策略。为了保证快速收敛，我们采用了一种能够在单个时间尺度上进行学习的近似策略，然后给出了将近似精度与LPQL返回值的上界$t\rightarrow{}\infty$联系起来的保证。最后，我们表明，我们的方法在多个设置中总是优于基线，包括一个来自真实世界的药物依从性数据。摘要：Multi-action restless multi-armed bandits (RMABs) are a powerful framework for constrained resource allocation in which $N$ independent processes are managed. However, previous work only study the offline setting where problem dynamics are known. We address this restrictive assumption, designing the first algorithms for learning good policies for Multi-action RMABs online using combinations of Lagrangian relaxation and Q-learning. Our first approach, MAIQL, extends a method for Q-learning the Whittle index in binary-action RMABs to the multi-action setting. We derive a generalized update rule and convergence proof and establish that, under standard assumptions, MAIQL converges to the asymptotically optimal multi-action RMAB policy as $t\rightarrow{}\infty$. However, MAIQL relies on learning Q-functions and indexes on two timescales which leads to slow convergence and requires problem structure to perform well. Thus, we design a second algorithm, LPQL, which learns the well-performing and more general Lagrange policy for multi-action RMABs by learning to minimize the Lagrange bound through a variant of Q-learning. To ensure fast convergence, we take an approximation strategy that enables learning on a single timescale, then give a guarantee relating the approximation's precision to an upper bound of LPQL's return as $t\rightarrow{}\infty$. Finally, we show that our approaches always outperform baselines across multiple settings, including one derived from real-world medication adherence data.

【20】 High-Throughput Precision Phenotyping of Left Ventricular Hypertrophy with Cardiovascular Deep Learning 标题：心血管深度学习高通量高精度左心室肥厚表型研究

作者：Grant Duffy,Paul P Cheng,Neal Yuan,Bryan He,Alan C. Kwan,Matthew J. Shun-Shin,Kevin M. Alexander,Joseph Ebinger,Matthew P. Lungren,Florian Rader,David H. Liang,Ingela Schnittger,Euan A. Ashley,James Y. Zou,Jignesh Patel,Ronald Witteles,Susan Cheng,David Ouyang 机构：. Department of Cardiology, Smidt Heart Institute, Cedars-Sinai Medical Center, . Department of Medicine, Division of Cardiology, Stanford University, . Department of Computer Science, Stanford University 链接：https://arxiv.org/abs/2106.12511 摘要：左心室肥大（LVH）是由一系列系统性和心血管疾病引起的慢性重构引起的，这些疾病包括高血压、主动脉狭窄、肥厚型心肌病和心脏淀粉样变。LVH的早期发现和特征化可以显著影响患者的护理，但由于对肥厚认识不足、测量误差和变异性以及难以区分LVH的病因而受到限制。为了克服这一挑战，我们提出了EchoNet LVH-一个深度学习的工作流程，它可以自动量化心室肥大，精确度与人类专家相当，并预测LVH的病因。在28201个超声心动图视频上训练，我们的模型精确测量了心室壁厚度（平均绝对误差[MAE]1.4mm，95%CI 1.2-1.5mm），左心室直径（MAE 2.4mm，95%CI 2.2-2.6mm）和后壁厚度（MAE 1.2mm，95%ci1.1-1.3mm），并将心肌淀粉样变（曲线下面积0.83）和肥厚型心肌病（auc0.98）与LVH的其他病因进行分类。在来自独立的国内和国际医疗系统的外部数据集中，EchoNet LVH准确地量化了心室参数（R2分别为0.96和0.90），并在国内外部验证站点检测到心脏淀粉样变（AUC 0.79）和肥厚型心肌病（AUC 0.89）。利用多个心跳的测量，我们的模型可以更准确地识别LV几何结构的细微变化及其病因。与人类专家相比，ECHONET LVH是完全自动化的，允许重复的、精确的测量，并为心肌肥厚的精确诊断奠定基础。作为促进进一步创新的资源，我们还公开了23212个带注释的超声心动图视频的大数据集。摘要：Left ventricular hypertrophy (LVH) results from chronic remodeling caused by a broad range of systemic and cardiovascular disease including hypertension, aortic stenosis, hypertrophic cardiomyopathy, and cardiac amyloidosis. Early detection and characterization of LVH can significantly impact patient care but is limited by under-recognition of hypertrophy, measurement error and variability, and difficulty differentiating etiologies of LVH. To overcome this challenge, we present EchoNet-LVH - a deep learning workflow that automatically quantifies ventricular hypertrophy with precision equal to human experts and predicts etiology of LVH. Trained on 28,201 echocardiogram videos, our model accurately measures intraventricular wall thickness (mean absolute error [MAE] 1.4mm, 95% CI 1.2-1.5mm), left ventricular diameter (MAE 2.4mm, 95% CI 2.2-2.6mm), and posterior wall thickness (MAE 1.2mm, 95% CI 1.1-1.3mm) and classifies cardiac amyloidosis (area under the curve of 0.83) and hypertrophic cardiomyopathy (AUC 0.98) from other etiologies of LVH. In external datasets from independent domestic and international healthcare systems, EchoNet-LVH accurately quantified ventricular parameters (R2 of 0.96 and 0.90 respectively) and detected cardiac amyloidosis (AUC 0.79) and hypertrophic cardiomyopathy (AUC 0.89) on the domestic external validation site. Leveraging measurements across multiple heart beats, our model can more accurately identify subtle changes in LV geometry and its causal etiologies. Compared to human experts, EchoNet-LVH is fully automated, allowing for reproducible, precise measurements, and lays the foundation for precision diagnosis of cardiac hypertrophy. As a resource to promote further innovation, we also make publicly available a large dataset of 23,212 annotated echocardiogram videos.

【21】 Calibrating the Lee-Carter and the Poisson Lee-Carter models via Neural Networks 标题：用神经网络标定Lee-Carter和Poisson Lee-Carter模型

作者：Salvatore Scognamiglio 链接：https://arxiv.org/abs/2106.12312 摘要：本文介绍了一种神经网络方法来拟合多种群的Lee-Carter和Poisson-Lee-Carter模型。我们开发了一些神经网络来复制单个LC模型的结构，并通过同时分析所有考虑人群的死亡率数据来进行联合拟合。神经网络体系结构是专门设计用来校准每个单独的模型使用所有可用的信息，而不是使用人口特定的数据子集在传统的估计方案。在人类死亡率数据库（HMD）的所有国家进行的大量数值实验表明了该方法的有效性。特别是，由此产生的参数估计似乎很平稳，对死亡率数据中经常出现的随机波动不太敏感，特别是对于低人口国家。此外，预测效果也有显著提高。摘要：This paper introduces a neural network approach for fitting the Lee-Carter and the Poisson Lee-Carter model on multiple populations. We develop some neural networks that replicate the structure of the individual LC models and allow their joint fitting by analysing the mortality data of all the considered populations simultaneously. The neural network architecture is specifically designed to calibrate each individual model using all available information instead of using a population-specific subset of data as in the traditional estimation schemes. A large set of numerical experiments performed on all the countries of the Human Mortality Database (HMD) shows the effectiveness of our approach. In particular, the resulting parameter estimates appear smooth and less sensitive to the random fluctuations often present in the mortality rates' data, especially for low-population countries. In addition, the forecasting performance results significantly improved as well.

其他(12篇)

【1】 Training Data Subset Selection for Regression with Controlled Generalization Error 标题：具有受控泛化误差的回归训练数据子集选择

作者：Durga Sivasubramanian,Rishabh Iyer,Ganesh Ramakrishnan,Abir De 备注：None 链接：https://arxiv.org/abs/2106.12491 摘要：从大量训练实例中选择数据子集是一种高效、经济的机器学习方法。然而，在较小子集上训练的模型泛化能力较差。在本文中，我们的目标是设计一个选择训练数据子集的算法，以便在不显著牺牲精度的情况下快速训练模型。更具体地说，我们专注于L2正则化回归问题的数据子集选择，并提供了一个新的问题公式，该公式寻求在验证集上受误差界影响的情况下，最小化关于可训练参数和训练数据子集的训练损失。我们通过一些技术创新来解决这个问题。首先，我们使用原始训练问题的对偶，用简化的约束来表示这个问题，并且证明这个新表示的目标是一个单调的α-子模函数，用于各种各样的建模选择。这样的性质使得我们开发了SELCON，一种有效的数据子集选择优化最小化算法，它允许一个近似保证，即使训练提供了训练模型的不完全估计。最后，我们在多个数据集上的实验表明，SELCON比目前的最新技术更有效地权衡了准确性和效率。摘要：Data subset selection from a large number of training instances has been a successful approach toward efficient and cost-effective machine learning. However, models trained on a smaller subset may show poor generalization ability. In this paper, our goal is to design an algorithm for selecting a subset of the training data, so that the model can be trained quickly, without significantly sacrificing on accuracy. More specifically, we focus on data subset selection for L2 regularized regression problems and provide a novel problem formulation which seeks to minimize the training loss with respect to both the trainable parameters and the subset of training data, subject to error bounds on the validation set. We tackle this problem using several technical innovations. First, we represent this problem with simplified constraints using the dual of the original training problem and show that the objective of this new representation is a monotone and alpha-submodular function, for a wide variety of modeling choices. Such properties lead us to develop SELCON, an efficient majorization-minimization algorithm for data subset selection, that admits an approximation guarantee even when the training provides an imperfect estimate of the trained model. Finally, our experiments on several datasets show that SELCON trades off accuracy and efficiency more effectively than the current state-of-the-art.

【2】 Real-time Neural Radiance Caching for Path Tracing 标题：用于路径跟踪的实时神经辐射度缓存

作者：Thomas Müller,Fabrice Rousselle,Jan Novák,Alexander Keller 机构：Path tracing, + ReSTIR, + NRC (Ours), Reference, fps 备注：To appear at SIGGRAPH 2021. 16 pages, 16 figures 链接：https://arxiv.org/abs/2106.12372 摘要：提出了一种用于路径跟踪全局光照的实时神经辐射缓存方法。我们的系统设计用于处理完全动态的场景，并且不需要对灯光、几何体和材质进行任何假设。我们的方法的数据驱动特性回避了缓存算法的许多困难，例如定位、插值和更新缓存点。由于预训练神经网络来处理新颖、动态的场景是一项艰巨的泛化挑战，我们放弃了预训练，而是通过自适应来实现泛化，即在渲染时选择训练辐射缓存。我们使用自训练来提供低噪声训练目标，并通过迭代少量的反弹训练更新来模拟无限反弹传输。由于充分利用现代硬件的神经网络的流式实现，更新和缓存查询会产生轻微的开销（全高清分辨率约为2.6毫秒）。我们以较小的诱导偏差为代价演示了显著的噪声降低，并报告了在许多具有挑战性的场景中的最新实时性能。摘要：We present a real-time neural radiance caching method for path-traced global illumination. Our system is designed to handle fully dynamic scenes, and makes no assumptions about the lighting, geometry, and materials. The data-driven nature of our approach sidesteps many difficulties of caching algorithms, such as locating, interpolating, and updating cache points. Since pretraining neural networks to handle novel, dynamic scenes is a formidable generalization challenge, we do away with pretraining and instead achieve generalization via adaptation, i.e. we opt for training the radiance cache while rendering. We employ self-training to provide low-noise training targets and simulate infinite-bounce transport by merely iterating few-bounce training updates. The updates and cache queries incur a mild overhead -- about 2.6ms on full HD resolution -- thanks to a streaming implementation of the neural network that fully exploits modern hardware. We demonstrate significant noise reduction at the cost of little induced bias, and report state-of-the-art, real-time performance on a number of challenging scenarios.

【3】 Random Effect Bandits 标题：随机效应土匪

作者：Rong Zhu,Branislav Kveton 机构：Institute of Science and Technology for Brain-inspired Intelligence, Fudan University, Shanghai, China, MOE Frontiers Center for Brain Science, Fudan University, Google Research, Mountain View, CA, USA 链接：https://arxiv.org/abs/2106.12200 摘要：本文研究了一个经典的在线学习问题——多武装土匪的后悔最小化问题。为了开发更有效的统计算法，我们建议使用随机效应模型的假设。在这个模型中，武器的平均报酬是独立于我们估计的参数的未知分布而得出的。我们给出了该模型中arm均值的估计量，并对其不确定性进行了分析。基于这些结果，我们设计了一个UCB算法，我们称之为ReUCB。我们分析了ReUCB，并证明了它的$n$轮遗憾上的Bayes遗憾界与现有的下限相匹配。我们的实验表明，ReUCB可以在各种情况下优于Thompson抽样，而不必假设arm均值的先验分布是已知的。摘要：This paper studies regret minimization in multi-armed bandits, a classical online learning problem. To develop more statistically-efficient algorithms, we propose to use the assumption of a random-effect model. In this model, the mean rewards of arms are drawn independently from an unknown distribution, whose parameters we estimate. We provide an estimator of the arm means in this model and also analyze its uncertainty. Based on these results, we design a UCB algorithm, which we call ReUCB. We analyze ReUCB and prove a Bayes regret bound on its $n$-round regret, which matches an existing lower bound. Our experiments show that ReUCB can outperform Thompson sampling in various scenarios, without assuming that the prior distribution of arm means is known.

【4】 A Review of Assistive Technologies for Activities of Daily Living of Elderly 标题：老年人日常生活活动辅助技术综述

作者：Nirmalya Thakur,Chia Y. Han 机构：Department of Electrical Engineering and Computer Science, College of Engineering and Applied Sciences, University of Cincinnati, Ohio, US 备注：None 链接：https://arxiv.org/abs/2106.12183 摘要：本世纪的一个显著特点是老年人口不断增加。随着年龄的增长，老年人由于身体残疾、认知问题、记忆力减退和行为紊乱而有多种需求和要求。这些限制的程度也因年龄、性别、背景、经验、技能、知识等不同而有所不同。随着年龄的增长，这些不同的需求和挑战限制了老年人独立进行日常生活活动的能力。此外，护理人员的短缺使老年人迫切需要以技术为基础的服务，帮助他们完成日常工作，以维持他们的独立生活和积极老龄化。为了满足这些需要，这项工作包括在这一领域作出三大贡献。首先，它提供了一个相当全面的审查辅助生活技术，旨在帮助老年人进行日常生活能力。其次，工作讨论了通过本次审查确定的挑战，这些挑战目前存在于智能家居和智能城市中实施老年人护理辅助生活服务的背景下。最后，该工作还概述了实施、扩展和整合该领域现有工作的方法，以便开发一个急需的框架，能够根据老年人不同和不断变化的需求为他们提供个性化的帮助和以用户为中心的行为干预。摘要：One of the distinct features of this century has been the population of older adults which has been on a constant rise. Elderly people have several needs and requirements due to physical disabilities, cognitive issues, weakened memory and disorganized behavior, that they face with increasing age. The extent of these limitations also differs according to the varying diversities in elderly, which include age, gender, background, experience, skills, knowledge and so on. These varying needs and challenges with increasing age, limits abilities of older adults to perform Activities of Daily Living (ADLs) in an independent manner. To add to it, the shortage of caregivers creates a looming need for technology-based services for elderly people, to assist them in performing their daily routine tasks to sustain their independent living and active aging. To address these needs, this work consists of making three major contributions in this field. First, it provides a rather comprehensive review of assisted living technologies aimed at helping elderly people to perform ADLs. Second, the work discusses the challenges identified through this review, that currently exist in the context of implementation of assisted living services for elderly care in Smart Homes and Smart Cities. Finally, the work also outlines an approach for implementation, extension and integration of the existing works in this field for development of a much-needed framework that can provide personalized assistance and user-centered behavior interventions to elderly as per their varying and ever-changing needs.

【5】 Deep Gaussian Processes: A Survey 标题：深高斯过程：综述

作者：Kalvik Jakkala 机构：Department of Computer Science, University of North Carolina at Charlotte, Charlotte, NC, USA 备注：23 pages, 5 figures 链接：https://arxiv.org/abs/2106.12135 摘要：高斯过程是贝叶斯学习的主要方法之一。虽然这种方法已成功地应用于许多问题，但它有一些基本的局限性。文献中的多种方法解决了这些局限性。然而，到目前为止，还没有对这些主题进行全面的调查。现有的大多数研究只关注高斯过程及其导数的一种特殊变体。这项调查详细说明了使用高斯过程的核心动机，它们的数学公式，局限性，以及多年来蓬勃发展的研究主题，以解决上述局限性。此外，深高斯过程（DGPs）是一个特殊的研究领域，在过去的十年中得到了很大的发展。他们的调查概述了推动这一研究领域前沿的重要出版物。最后，对存在的问题和今后的研究方向进行了简要的讨论。摘要：Gaussian processes are one of the dominant approaches in Bayesian learning. Although the approach has been applied to numerous problems with great success, it has a few fundamental limitations. Multiple methods in literature have addressed these limitations. However, there has not been a comprehensive survey of the topics as of yet. Most existing surveys focus on only one particular variant of Gaussian processes and their derivatives. This survey details the core motivations for using Gaussian processes, their mathematical formulations, limitations, and research themes that have flourished over the years to address said limitations. Furthermore, one particular research area is Deep Gaussian Processes (DGPs), it has improved substantially in the past decade. The significant publications that advanced the forefront of this research area are outlined in their survey. Finally, a brief discussion on open problems and research directions for future work is presented at the end.

【6】 Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training 标题：在空间上结构化，在时间上随机化：利用RNN中的辍学进行高效训练

作者：Anup Sarma,Sonali Singh,Huaipan Jiang,Rui Zhang,Mahmut T Kandemir,Chita R Das 机构：Mahmut T. Kandemir, Chita R. Das, Pennsylvania State University 链接：https://arxiv.org/abs/2106.12089 摘要：递归神经网络（RNN），更具体地说是其长-短期记忆（LSTM）变体，已被广泛用作处理文本和语音中基于序列的学习任务的深度学习工具。这种LSTM应用程序的训练是计算密集型的，这是由于隐藏状态计算在每个时间步重复的循环性质。虽然深度神经网络中的稀疏性被广泛认为是减少训练和推理阶段计算时间的一个机会，但是在LSTM-RNNs中使用非ReLU激活使得这种与神经元激活和梯度值相关的动态稀疏性的机会是有限的或不存在的。在这项工作中，我们确定辍学引起的稀疏为一个合适的模式，减少计算。辍学是一种广泛使用的正则化机制，它在训练的每次迭代中随机丢弃计算出的神经元值。我们建议通过在一批中删除同一组物理神经元来构造丢失模式，从而产生列（行）级隐藏状态稀疏性，在通用SIMD硬件和脉动阵列中，这种稀疏性非常适合在运行时减少计算量。我们对三个具有代表性的NLP任务进行了实验：PTB数据集上的语言建模，使用IWSLT-De-En和En-Vi数据集的基于OpenNMT的机器翻译，以及使用CoNLL-2003共享任务的命名实体识别序列标记。我们证明了我们提出的方法可以在不牺牲目标度量的情况下，将基于辍学的计算缩减转化为减少的训练时间，其改进范围为1.23x到1.64x。摘要：Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each iteration of training. We propose to structure dropout patterns, by dropping out the same set of physical neurons within a batch, resulting in column (row) level hidden state sparsity, which are well amenable to computation reduction at run-time in general-purpose SIMD hardware as well as systolic arrays. We conduct our experiments for three representative NLP tasks: language modelling on the PTB dataset, OpenNMT based machine translation using the IWSLT De-En and En-Vi datasets, and named entity recognition sequence labelling using the CoNLL-2003 shared task. We demonstrate that our proposed approach can be used to translate dropout-based computation reduction into reduced training time, with improvement ranging from 1.23x to 1.64x, without sacrificing the target metric.

【7】 A Practical & Unified Notation for Information-Theoretic Quantities in ML 标题：ML语言中信息论数量的一种实用统一表示法

作者：Andreas Kirsch,Yarin Gal 机构： University of Oxford 链接：https://arxiv.org/abs/2106.12062 摘要：信息论对机器学习很重要，但信息论量的符号有时是不透明的。正确的符号可以传达有价值的直觉和简洁地表达新的想法。我们为机器学习用户提出了这样一个表示法，并将其扩展到包含事件（结果）和随机变量之间的信息论量。我们将这个符号应用到贝叶斯主动学习中一个流行的信息理论获取函数中，该函数选择信息量最大的（未标记的）样本由专家标记。我们在将获取函数扩展到核心集问题时演示了符号的价值，核心集问题包括选择信息量最大的样本\emph{给定}标签。摘要：Information theory is of importance to machine learning, but the notation for information-theoretic quantities is sometimes opaque. The right notation can convey valuable intuitions and concisely express new ideas. We propose such a notation for machine learning users and expand it to include information-theoretic quantities between events (outcomes) and random variables. We apply this notation to a popular information-theoretic acquisition function in Bayesian active learning which selects the most informative (unlabelled) samples to be labelled by an expert. We demonstrate the value of our notation when extending the acquisition function to the core-set problem, which consists of selecting the most informative samples \emph{given} the labels.

【8】 Learned Interpretable Residual Extragradient ISTA for Sparse Coding 标题：稀疏编码的学习可解释残差外梯度ISTA

作者：Lin Kong,Wei Sun,Fanhua Shang,Yuanyuan Liu,Hongying Liu 机构： School of Artificial Intelligence, XidianUniversity 备注：Accepted for presentation at the ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI 链接：https://arxiv.org/abs/2106.11970 摘要：近年来，学习型迭代收缩阈值算法（LISTA）的研究越来越受到人们的关注。大量的实验和理论证明了LISTA在解决稀疏编码问题上的高效性。然而，现有的LISTA方法都是串行连接。为了解决这个问题，我们提出了一种新的基于超梯度的LISTA（ELISTA），它具有剩余结构和理论保证。特别是，我们的算法也能在一定程度上为Res网提供可解释性。从理论上证明了该方法具有线性收敛性。在实践中，大量的实证结果验证了该方法的优越性。摘要：Recently, the study on learned iterative shrinkage thresholding algorithm (LISTA) has attracted increasing attentions. A large number of experiments as well as some theories have proved the high efficiency of LISTA for solving sparse coding problems. However, existing LISTA methods are all serial connection. To address this issue, we propose a novel extragradient based LISTA (ELISTA), which has a residual structure and theoretical guarantees. In particular, our algorithm can also provide the interpretability for Res-Net to a certain extent. From a theoretical perspective, we prove that our method attains linear convergence. In practice, extensive empirical results verify the advantages of our method.

【9】 Sampling with Mirrored Stein Operators 标题：镜像Stein算子抽样

作者：Jiaxin Shi,Chang Liu,Lester Mackey 机构：Microsoft Research, Cambridge, MA, Beijing 备注：23 pages 链接：https://arxiv.org/abs/2106.12506 摘要：我们介绍了一个新的粒子演化采样器家族，适用于约束域和非欧几里德几何。Stein变分镜像下降和镜像Stein变分梯度下降使Kullback-Leibler（KL）散度最小化，从而使粒子在镜像映射定义的对偶空间中演化。Stein变分自然梯度法利用非欧几里德几何原理，有效地减小了KL对无约束目标的发散。我们从一类新的镜像Stein算子和自适应核中导出了这些采样器。我们证明了这些新的采样器可以精确地逼近单纯形上的分布，在后选择推理中提供有效的置信区间，并且在大规模无约束后验推理中比以前的方法收敛更快。最后，在目标分布的可验证条件下，证明了新方法的收敛性。摘要：We introduce a new family of particle evolution samplers suitable for constrained domains and non-Euclidean geometries. Stein Variational Mirror Descent and Mirrored Stein Variational Gradient Descent minimize the Kullback-Leibler (KL) divergence to constrained target distributions by evolving particles in a dual space defined by a mirror map. Stein Variational Natural Gradient exploits non-Euclidean geometry to more efficiently minimize the KL divergence to unconstrained targets. We derive these samplers from a new class of mirrored Stein operators and adaptive kernels developed in this work. We demonstrate that these new samplers yield accurate approximations to distributions on the simplex, deliver valid confidence intervals in post-selection inference, and converge more rapidly than prior methods in large-scale unconstrained posterior inference. Finally, we establish the convergence of our new procedures under verifiable conditions on the target distribution.

【10】 ParK: Sound and Efficient Kernel Ridge Regression by Feature Space Partitions 标题：PARK：基于特征空间划分的合理高效核岭回归

作者：Luigi Carratino,Stefano Vigogna,Daniele Calandriello,Lorenzo Rosasco 机构：MaLGa - DIBRIS, University of Genova, Italy, DeepMind Paris, France, Istituto Italiano di Tecnologia, Genova, Italy, CBMM - MIT, Cambridge, MA, USA 链接：https://arxiv.org/abs/2106.12231 摘要：介绍了一种新的大规模核岭回归求解器ParK。我们的方法结合了分区与随机投影和迭代优化，以减少空间和时间复杂度，同时可证明保持相同的统计精度。特别地，直接在特征空间而不是在输入空间中构造适当的划分，我们促进了局部估计量之间的正交性，从而确保了局部有效维数和偏差等关键量保持在控制之下。我们描述了我们模型的统计计算折衷，并通过大规模数据集的数值实验证明了我们方法的有效性。摘要：We introduce ParK, a new large-scale solver for kernel ridge regression. Our approach combines partitioning with random projections and iterative optimization to reduce space and time complexity while provably maintaining the same statistical accuracy. In particular, constructing suitable partitions directly in the feature space rather than in the input space, we promote orthogonality between the local estimators, thus ensuring that key quantities such as local effective dimension and bias remain under control. We characterize the statistical-computational tradeoff of our model, and demonstrate the effectiveness of our method by numerical experiments on large-scale datasets.

【11】 Closed-Form, Provable, and Robust PCA via Leverage Statistics and Innovation Search 标题：通过利用统计数据和创新搜索实现封闭式、可证明且强大的PCA

作者：Mostafa Rahmani,Ping Li 机构：Cognitive Computing Lab, Baidu Research, NE ,th St. Bellevue, WA , USA 备注：Published in IEEE Transactions on Signal Processing. arXiv admin note: text overlap with arXiv:1912.12988 链接：https://arxiv.org/abs/2106.12190 摘要：新息搜索的思想最初被提出用于数据聚类，最近被用于离群点检测。在应用新息搜索进行离群点检测时，利用新息方向来度量数据点的新息。研究了二次成本函数下新息搜索算法计算的新息值，证明了新成本函数下的新息值等价于杠杆率得分。利用这一有趣的联系，为基于杠杆评分的鲁棒PCA方法建立了若干理论保证，并设计了一种新的鲁棒PCA方法。理论结果包括对异常值分布和内联值分布的不同模型的性能保证。此外，我们还证明了算法对噪声的鲁棒性。数值和理论研究表明，该方法具有快速性和封闭性，其性能优于现有的大多数算法。摘要：The idea of Innovation Search, which was initially proposed for data clustering, was recently used for outlier detection. In the application of Innovation Search for outlier detection, the directions of innovation were utilized to measure the innovation of the data points. We study the Innovation Values computed by the Innovation Search algorithm under a quadratic cost function and it is proved that Innovation Values with the new cost function are equivalent to Leverage Scores. This interesting connection is utilized to establish several theoretical guarantees for a Leverage Score based robust PCA method and to design a new robust PCA method. The theoretical results include performance guarantees with different models for the distribution of outliers and the distribution of inliers. In addition, we demonstrate the robustness of the algorithms against the presence of noise. The numerical and theoretical studies indicate that while the presented approach is fast and closed-form, it can outperform most of the existing algorithms.

【12】 Pure Exploration in Kernel and Neural Bandits 标题：核函数与神经环中的纯探索

作者：Yinglun Zhu,Dongruo Zhou,Ruoxi Jiang,Quanquan Gu,Rebecca Willett,Robert Nowak 机构：University of Wisconsin-Madison, University of California, Los Angeles, University of Chicago 链接：https://arxiv.org/abs/2106.12034 摘要：我们研究土匪的纯探索，在土匪中，特征表示的维数可以远远大于武器的数目。为了克服维数灾难，我们建议自适应地将每个手臂的特征表示嵌入到低维空间中，并仔细处理诱导的模型错误。我们的方法在概念上与现有的只能处理低维线性强盗或被动处理模型错误的工作有很大不同。我们展示了我们的方法在两个纯探索环境中的应用，这两个纯探索环境是以前研究的：（1）奖励函数属于一个可能无限维的再生核Hilbert空间，（2）奖励函数是非线性的，可以用神经网络来逼近。我们的主要结果提供了样本复杂性保证，仅依赖于核或神经表示中特征空间的有效维数。在合成数据集和真实数据集上进行的大量实验证明了我们方法的有效性。摘要：We study pure exploration in bandits, where the dimension of the feature representation can be much larger than the number of arms. To overcome the curse of dimensionality, we propose to adaptively embed the feature representation of each arm into a lower-dimensional space and carefully deal with the induced model misspecifications. Our approach is conceptually very different from existing works that can either only handle low-dimensional linear bandits or passively deal with model misspecifications. We showcase the application of our approach to two pure exploration settings that were previously under-studied: (1) the reward function belongs to a possibly infinite-dimensional Reproducing Kernel Hilbert Space, and (2) the reward function is nonlinear and can be approximated by neural networks. Our main results provide sample complexity guarantees that only depend on the effective dimension of the feature spaces in the kernel or neural representations. Extensive experiments conducted on both synthetic and real-world datasets demonstrate the efficacy of our methods.

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-06-24，如有侵权请联系 cloudcommunity@tencent.com 删除

linux