机器学习学术速递[7.9]

公众号-arXiv每日学术速递

发布于 2021-07-27 10:43:54

1.2K0

发布于 2021-07-27 10:43:54

文章被收录于专栏：arXiv每日学术速递

cs.LG 方向，今日共计75篇

Graph相关(图学习|图神经网络|图优化等)(1篇)

【1】 Rating and aspect-based opinion graph embeddings for explainable recommendations 标题：用于可解释推荐的评级和基于方面的意见图嵌入

作者：Iván Cantador,Andrés Carvallo,Fernando Diez 机构：Universidad Autónoma de Madrid, Madrid, Spain, Pontificia Universidad Católica de, Santiago, Chile 备注：arXiv admin note: substantial text overlap with arXiv:2107.03226 链接：https://arxiv.org/abs/2107.03385 摘要：随着神经网络嵌入技术的成功，人们对利用知识图进行各种机器学习和信息检索产生了新的兴趣。特别地，最近基于图嵌入的推荐方法已经显示出最先进的性能。通常，这些方法对潜在的评级模式和内容特征进行编码。不同于以往的工作，在本文中，我们建议利用嵌入提取的图表，结合信息评级和方面的意见表达在文本审查。然后，我们在6个域上采用并评估了最先进的图嵌入技术，这些技术比Amazon和Yelp评论生成的图具有更好的性能。此外，我们的方法的优点是提供解释，包括用户对推荐项目基于方面的意见。摘要：The success of neural network embeddings has entailed a renewed interest in using knowledge graphs for a wide variety of machine learning and information retrieval tasks. In particular, recent recommendation methods based on graph embeddings have shown state-of-the-art performance. In general, these methods encode latent rating patterns and content features. Differently from previous work, in this paper, we propose to exploit embeddings extracted from graphs that combine information from ratings and aspect-based opinions expressed in textual reviews. We then adapt and evaluate state-of-the-art graph embedding techniques over graphs generated from Amazon and Yelp reviews on six domains, outperforming baseline recommenders. Additionally, our method has the advantage of providing explanations that involve the coverage of aspect-based opinions given by users about recommended items.

Transformer(3篇)

【1】 Learning Vision-Guided Quadrupedal Locomotion End-to-End with Cross-Modal Transformers 标题：学习视觉引导的四足运动与跨模态Transformer的端到端运动

作者：Ruihan Yang,Minghao Zhang,Nicklas Hansen,Huazhe Xu,Xiaolong Wang 机构：UC San Diego, Tsinghua University, UC Berkeley 备注：Our project page with videos is at this https URL 链接：https://arxiv.org/abs/2107.03996 摘要：我们建议使用强化学习（RL）和基于Transformer的模型来处理四足动物的运动任务，该模型学习结合本体感知信息和高维深度传感器输入。基于学习的运动训练在RL领域有了很大的发展，但大多数方法仍然依赖于领域随机化来训练具有挑战性的盲智能体。我们的关键洞察是，本体感觉状态仅提供即时反应的接触测量，而配备视觉感官观察的代理可以通过预测前方环境的变化，学会主动操纵有障碍物和不平坦地形的环境。本文介绍了一种端到端的四足运动RL方法LocoTransformer，它利用基于Transformer的模型融合本体感觉状态和视觉观察。我们在具有不同障碍物和不平坦地形的模拟环境中评估了我们的方法。结果表明，与仅使用本体感知状态输入的策略相比，该方法取得了显著的改进，基于Transformer的模型进一步提高了跨环境的泛化能力。我们的视频项目页面位于https://RchalYang.github.io/LocoTransformer . 摘要：We propose to address quadrupedal locomotion tasks using Reinforcement Learning (RL) with a Transformer-based model that learns to combine proprioceptive information and high-dimensional depth sensor inputs. While learning-based locomotion has made great advances using RL, most methods still rely on domain randomization for training blind agents that generalize to challenging terrains. Our key insight is that proprioceptive states only offer contact measurements for immediate reaction, whereas an agent equipped with visual sensory observations can learn to proactively maneuver environments with obstacles and uneven terrain by anticipating changes in the environment many steps ahead. In this paper, we introduce LocoTransformer, an end-to-end RL method for quadrupedal locomotion that leverages a Transformer-based model for fusing proprioceptive states and visual observations. We evaluate our method in challenging simulated environments with different obstacles and uneven terrain. We show that our method obtains significant improvements over policies with only proprioceptive state inputs, and that Transformer-based models further improve generalization across environments. Our project page with videos is at https://RchalYang.github.io/LocoTransformer .

【2】 A Review of Bangla Natural Language Processing Tasks and the Utility of Transformer Models 标题：孟加拉自然语言处理任务与Transformer模型实用性述评

作者：Firoj Alam,Arid Hasan,Tanvir Alam,Akib Khan,Janntatul Tajrin,Naira Khan,Shammur Absar Chowdhury 机构：TANVIRUL ALAM, BJIT Limited, Bangladesh, JANNATUL TAJRIN, Cognitive Insight Limited, Bangladesh, Bangla – ranked as the ,?ℎ most widely spoken language across the world, with , million native speakers – 备注：Under Review, Bangla language processing, text classification, sequence tagging, datasets, benchmarks, transformer models 链接：https://arxiv.org/abs/2107.03844 摘要：孟加拉语是世界上使用最广泛的第六大语言(https://www.ethnologue.com/guides/ethnologue200)在自然语言处理（NLP）社区中，仍被视为低资源语言。经过三十年的研究，孟加拉邦民族解放党（BNLP）仍然落后，主要原因是资源匮乏和随之而来的挑战。BNLP的不同领域的工作比较少；然而，报告先前工作和最近进展的全面调查尚待完成。在这项研究中，我们首先提供了一个审查孟加拉语NLP的任务，资源和工具提供给研究界；我们使用当前最先进的算法（即基于Transformer的模型）对从不同平台收集的9个NLP任务的数据集进行基准测试。通过比较不同大小的单语和多语模型，我们为所研究的自然语言处理任务提供了比较结果。我们使用单独和合并的数据集报告我们的结果，并为将来的研究提供数据分割。我们共复习了108篇论文，进行了175组实验。我们的结果表明，使用基于Transformer的模型有很好的性能，同时强调了计算成本的权衡。我们希望，这样一个全面的调查将激励社会上建立和进一步推进孟加拉语民族解放党的研究。摘要：Bangla -- ranked as the 6th most widely spoken language across the world (https://www.ethnologue.com/guides/ethnologue200), with 230 million native speakers -- is still considered as a low-resource language in the natural language processing (NLP) community. With three decades of research, Bangla NLP (BNLP) is still lagging behind mainly due to the scarcity of resources and the challenges that come with it. There is sparse work in different areas of BNLP; however, a thorough survey reporting previous work and recent advances is yet to be done. In this study, we first provide a review of Bangla NLP tasks, resources, and tools available to the research community; we benchmark datasets collected from various platforms for nine NLP tasks using current state-of-the-art algorithms (i.e., transformer-based models). We provide comparative results for the studied NLP tasks by comparing monolingual vs. multilingual models of varying sizes. We report our results using both individual and consolidated datasets and provide data splits for future research. We reviewed a total of 108 papers and conducted 175 sets of experiments. Our results show promising performance using transformer-based models while highlighting the trade-off with computational costs. We hope that such a comprehensive survey will motivate the community to build on and further advance the research on Bangla NLP.

【3】 BumbleBee: A Transformer for Music 标题：大黄蜂：音乐的Transformer

作者：Lucas Fenaux,Maria Juliana Quintero 机构：University of Toronto 备注：8 pages, 3 figures 链接：https://arxiv.org/abs/2107.03443 摘要：我们将介绍大黄蜂，Transformer模型，将产生MIDI音乐数据。我们将通过实现一个使用伸缩滑动窗口计算注意层的longformer生成模型来解决应用于长序列的Transformer问题。我们将把我们的结果与音乐Transformer和长-短期记忆（LSTM）的结果进行比较，以验证我们的结果。该分析将使用钢琴MIDI文件进行，特别是JSB合唱团数据集，该数据集已用于其他研究工作（Huang et al.，2018）摘要：We will introduce BumbleBee, a transformer model that will generate MIDI music data . We will tackle the issue of transformers applied to long sequences by implementing a longformer generative model that uses dilating sliding windows to compute the attention layers. We will compare our results to that of the music transformer and Long-Short term memory (LSTM) to benchmark our results. This analysis will be performed using piano MIDI files, in particular , the JSB Chorales dataset that has already been used for other research works (Huang et al., 2018)

GAN|对抗|攻击|生成相关(4篇)

【1】 Output Randomization: A Novel Defense for both White-box and Black-box Adversarial Models 标题：输出随机化：白箱和黑箱对抗模型的一种新防御方法

作者：Daniel Park,Haidar Khan,Azer Khan,Alex Gittens,Bülent Yener 机构：Rensselaer Polytechnic Institute, Amazon Alexa, SUNY New Paltz 备注：This is a substantially changed version of an earlier preprint (arXiv:1905.09871) 链接：https://arxiv.org/abs/2107.03806 摘要：对抗性例子在各种场景中对深层神经网络模型构成威胁，从“白盒”设置中的对手完全了解模型的设置，到“黑盒”设置中的相反设置。在本文中，我们探讨了在黑盒和白盒模型中使用输出随机化作为防御攻击的方法，并提出了两种防御方法。在第一种防御中，我们提出在测试时输出随机化来阻止黑盒环境中的有限差分攻击。由于这种类型的攻击依赖于对模型的重复查询来估计梯度，因此我们研究了如何使用随机化来阻止此类对手成功创建对手的示例。我们的经验表明，这种防御可以将使用零阶优化攻击的黑箱对手的成功率限制在0%。其次，我们提出输出随机化训练作为对抗白盒对手的一种防御方法。与以前使用随机化的方法不同，我们的防御不需要在测试时使用它，消除了向后传递可微近似攻击，这被证明对其他随机化防御是有效的。此外，这种防御具有较低的开销并且易于实现，允许它与跨各种模型体系结构的其他防御一起使用。我们评估了针对投影梯度下降攻击的输出随机化训练，结果表明，在使用交叉熵损失的情况下，这种防御可以将PGD攻击的成功率降低到12%。摘要：Adversarial examples pose a threat to deep neural network models in a variety of scenarios, from settings where the adversary has complete knowledge of the model in a "white box" setting and to the opposite in a "black box" setting. In this paper, we explore the use of output randomization as a defense against attacks in both the black box and white box models and propose two defenses. In the first defense, we propose output randomization at test time to thwart finite difference attacks in black box settings. Since this type of attack relies on repeated queries to the model to estimate gradients, we investigate the use of randomization to thwart such adversaries from successfully creating adversarial examples. We empirically show that this defense can limit the success rate of a black box adversary using the Zeroth Order Optimization attack to 0%. Secondly, we propose output randomization training as a defense against white box adversaries. Unlike prior approaches that use randomization, our defense does not require its use at test time, eliminating the Backward Pass Differentiable Approximation attack, which was shown to be effective against other randomization defenses. Additionally, this defense has low overhead and is easily implemented, allowing it to be used together with other defenses across various model architectures. We evaluate output randomization training against the Projected Gradient Descent attacker and show that the defense can reduce the PGD attack's success rate down to 12% when using cross-entropy loss.

【2】 Grid Partitioned Attention: Efficient TransformerApproximation with Inductive Bias for High Resolution Detail Generation 标题：网格划分注意力：用于高分辨率细节生成的带感应偏置的高效Transformer逼近

作者：Nikolay Jetchev,Gökhan Yildirim,Christian Bracher,Roland Vollgraf 机构：Zalando Research, Zalando SE, Berlin, Germany 备注：code available at this https URL 链接：https://arxiv.org/abs/2107.03742 摘要：注意是一种通用的推理机制，它不能灵活地处理图像信息，但由于其内存需求，目前还不能用于高分辨率图像的生成。我们提出了一种新的近似注意算法，它利用稀疏归纳偏差来提高图像域的计算和存储效率：查询只关注少数关键点，空间上的紧密查询由于相关而关注紧密的关键点。本文介绍了新的注意层，分析了它的复杂性，以及如何通过超参数来调整内存使用和模型功率之间的平衡。我们将展示这种注意如何支持新的深度学习体系结构和复制模块，这些模块特别适用于像姿势变形这样的条件图像生成任务。我们的贡献是（i）新的GPA层的算法和代码1，（ii）新的深度注意复制架构，以及（iii）最新的人体姿势变形生成基准的实验结果。摘要：Attention is a general reasoning mechanism than can flexibly deal with image information, but its memory requirements had made it so far impractical for high resolution image generation. We present Grid Partitioned Attention (GPA), a new approximate attention algorithm that leverages a sparse inductive bias for higher computational and memory efficiency in image domains: queries attend only to few keys, spatially close queries attend to close keys due to correlations. Our paper introduces the new attention layer, analyzes its complexity and how the trade-off between memory usage and model power can be tuned by the hyper-parameters.We will show how such attention enables novel deep learning architectures with copying modules that are especially useful for conditional image generation tasks like pose morphing. Our contributions are (i) algorithm and code1of the novel GPA layer, (ii) a novel deep attention-copying architecture, and (iii) new state-of-the art experimental results in human pose morphing generation benchmarks.

【3】 Generalization Error of GAN from the Discriminator's Perspective 标题：从鉴别器的角度看GaN的泛化误差

作者：Hongkang Yang,Weinan E 机构：Program in Applied and Computational Mathematics, Princeton University, Department of Mathematics, Princeton University, Beijing Institute of Big Data Research 链接：https://arxiv.org/abs/2107.03633 摘要：生成对抗网络（GAN）是一种学习高维分布的著名模型，但其泛化能力的机制尚不清楚。尤其是GAN易受记忆现象的影响，最终收敛到经验分布。我们考虑一个简化的GaN模型，用密度代替发电机，并分析鉴别器如何促进泛化。我们发现，在提前停止的情况下，用Wasserstein度量度量的泛化误差可以摆脱维数灾难，尽管从长期来看，记忆是不可避免的。此外，我们提出了一个困难的学习结果为WGAN。摘要：The generative adversarial network (GAN) is a well-known model for learning high-dimensional distributions, but the mechanism for its generalization ability is not understood. In particular, GAN is vulnerable to the memorization phenomenon, the eventual convergence to the empirical distribution. We consider a simplified GAN model with the generator replaced by a density, and analyze how the discriminator contributes to generalization. We show that with early stopping, the generalization error measured by Wasserstein metric escapes from the curse of dimensionality, despite that in the long term, memorization is inevitable. In addition, we present a hardness of learning result for WGAN.

【4】 Adaptive Stress Testing for Adversarial Learning in a Financial Environment 标题：金融环境下对抗性学习的自适应压力测试

作者：Khalid El-Awady 链接：https://arxiv.org/abs/2107.03577 摘要：我们演示了如何使用自适应压力测试来检测和解决金融环境中的潜在漏洞。我们开发了一个简化的信用卡欺诈检测模型，该模型利用基于历史支付交易数据和业务规则的线性回归分类器。然后，我们应用被称为自适应压力测试（adaptivestress Testing）的强化学习模型来训练一个可以被认为是潜在欺诈者的代理，以找到最有可能导致系统失败的路径——成功地欺诈系统。我们展示了这种最可能的故障路径与分类器限制之间的联系，并讨论了如何进一步增强欺诈检测系统的业务规则以减轻这些故障模式。摘要：We demonstrate the use of Adaptive Stress Testing to detect and address potential vulnerabilities in a financial environment. We develop a simplified model for credit card fraud detection that utilizes a linear regression classifier based on historical payment transaction data coupled with business rules. We then apply the reinforcement learning model known as Adaptive Stress Testing to train an agent, that can be thought of as a potential fraudster, to find the most likely path to system failure -- successfully defrauding the system. We show the connection between this most likely failure path and the limits of the classifier and discuss how the fraud detection system's business rules can be further augmented to mitigate these failure modes.

半/弱/无/有监督|不确定性|主动学习(7篇)

【1】 Offline Meta-Reinforcement Learning with Online Self-Supervision 标题：在线自我监控的离线元强化学习

作者：Vitchyr H. Pong,Ashvin Nair,Laura Smith,Catherine Huang,Sergey Levine 机构：UC Berkeley 备注：10 pages, 6 figures 链接：https://arxiv.org/abs/2107.03974 摘要：元强化学习（Meta-reinforcement learning，RL）可以用来训练快速适应新任务的策略，其数据量比标准RL少几个数量级，但这种快速适应的代价往往是在元训练期间大大增加奖励监督的数量。离线meta-RL消除了持续提供奖励监督的需要，因为当离线数据集生成时，奖励只能提供一次。除了离线RL的挑战外，元RL中还存在一个独特的分布变化：代理学习探索策略，可以收集学习新任务所需的经验，还学习适应策略，当呈现数据集中的轨迹时，这些策略效果很好，但适应策略并不适应所学探索策略所收集的数据分布。与在线环境不同，适应策略和探索策略不能有效地相互适应，导致表现不佳。本文提出了一种混合离线meta-RL算法，该算法利用带奖励的离线数据对自适应策略进行元训练，然后在没有任何地面真实奖励标签的情况下收集额外的无监督在线数据，以解决分布转移问题。该方法利用离线数据来学习奖励函数的分布，然后对这些函数进行抽样，对额外的在线数据进行奖励标签的自我监督。通过消除为在线体验提供奖励标签的需要，我们的方法可以更实际地用于奖励监督将手动提供的设置中。我们将我们的方法与先前的离线meta-RL在模拟机器人运动和操作任务上的工作进行了比较，发现使用额外的数据和自生成的奖励可以显著提高agent的泛化能力。摘要：Meta-reinforcement learning (RL) can be used to train policies that quickly adapt to new tasks with orders of magnitude less data than standard RL, but this fast adaptation often comes at the cost of greatly increasing the amount of reward supervision during meta-training time. Offline meta-RL removes the need to continuously provide reward supervision because rewards must only be provided once when the offline dataset is generated. In addition to the challenges of offline RL, a unique distribution shift is present in meta RL: agents learn exploration strategies that can gather the experience needed to learn a new task, and also learn adaptation strategies that work well when presented with the trajectories in the dataset, but the adaptation strategies are not adapted to the data distribution that the learned exploration strategies collect. Unlike the online setting, the adaptation and exploration strategies cannot effectively adapt to each other, resulting in poor performance. In this paper, we propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy, and then collects additional unsupervised online data, without any ground truth reward labels, to bridge this distribution shift problem. Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data. By removing the need to provide reward labels for the online experience, our approach can be more practical to use in settings where reward supervision would otherwise be provided manually. We compare our method to prior work on offline meta-RL on simulated robot locomotion and manipulation tasks and find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.

【2】 Understanding the Limits of Unsupervised Domain Adaptation via Data Poisoning 标题：通过数据中毒理解无监督领域自适应的局限性

作者：Akshay Mehra,Bhavya Kailkhura,Pin-Yu Chen,Jihun Hamm 机构：Tulane University, Lawrence Livermore National Laboratory, IBM Research 链接：https://arxiv.org/abs/2107.03919 摘要：无监督域自适应（Unsupervised domain adaption，UDA）是一种无目标域标记的跨域学习方法，它将知识从分布不同于目标域的标记源域中转移出来。然而，UDA并不总是成功的，文献中也有关于负迁移的报道。在这项工作中，我们证明了一个简单的下界对目标域的错误，补充现有的上限。我们的定界表明，由于诱导标记函数失配的可能增加，在保证目标域误差减少的前提下，最小化源域误差和边缘分布失配是不够的。这种不足通过简单的分布得到了进一步的说明，对于这些分布，相同的UDA方法成功、失败，并且可能以相同的机会成功或失败。基于此，我们提出了新的数据中毒攻击来欺骗UDA方法，使其成为产生较大目标域错误的学习表示。我们使用基准数据集评估这些攻击对流行的UDA方法的影响，这些数据集以前已经被证明是成功的。我们的结果表明，中毒会显著降低目标域的准确率，在某些情况下，在源域中只添加10%的中毒数据，使其几乎降到0%。UDA方法的失败说明了UDA在保证跨域泛化与下界一致方面的局限性。因此，在对抗性环境（如数据中毒）中评估UDA方法可以更好地了解它们在不利于UDA的场景中的健壮性。摘要：Unsupervised domain adaptation (UDA) enables cross-domain learning without target domain labels by transferring knowledge from a labeled source domain whose distribution differs from the target. However, UDA is not always successful and several accounts of "negative transfer" have been reported in the literature. In this work, we prove a simple lower bound on the target domain error that complements the existing upper bound. Our bound shows the insufficiency of minimizing source domain error and marginal distribution mismatch for a guaranteed reduction in the target domain error, due to the possible increase of induced labeling function mismatch. This insufficiency is further illustrated through simple distributions for which the same UDA approach succeeds, fails, and may succeed or fail with an equal chance. Motivated from this, we propose novel data poisoning attacks to fool UDA methods into learning representations that produce large target domain errors. We evaluate the effect of these attacks on popular UDA methods using benchmark datasets where they have been previously shown to be successful. Our results show that poisoning can significantly decrease the target domain accuracy, dropping it to almost 0\% in some cases, with the addition of only 10\% poisoned data in the source domain. The failure of UDA methods demonstrates the limitations of UDA at guaranteeing cross-domain generalization consistent with the lower bound. Thus, evaluation of UDA methods in adversarial settings such as data poisoning can provide a better sense of their robustness in scenarios unfavorable for UDA.

【3】 The Three Ensemble Clustering (3EC) Algorithm for Pattern Discovery in Unsupervised Learning 标题：无监督学习中模式发现的三集成聚类(3E)算法

作者：Kundu,Debasish 机构：Founder TUNL (an AI startup) 备注：17 pages 链接：https://arxiv.org/abs/2107.03729 摘要：本文提出了一种称为“三集成聚类3EC”的多学习算法，该算法将未标记的数据分类为高质量的聚类，作为无监督学习的一部分。它提供了探索基于内部验证索引的算法集合形成的新簇的上下文的灵活性。值得一提的是，输入数据集被认为是一个集群。异常也可能表现为集群。每一个分区的簇被认为是一个新的数据集，并且是探索最优算法和它的分区分裂数的候选者，直到一个预定义的停止标准被满足。这些算法独立地将数据集划分为多个簇，并通过一组内部簇验证指标来评估划分的质量。3EC算法表示从算法选择和分区配置中得到的验证索引分数，称为Tau网格。3EC选择最佳分数。3EC算法的名字归功于两个输入集成的算法和内部验证指标，以及一个输出集成的最终集群。在这种聚类方法中，质量起着重要的作用，它还起着阻止进一步划分的作用。质量取决于算法提供的簇的质量及其最佳分割数。3EC算法根据验证指标集合的得分来确定这一点。用户可以通过为每个验证指标的得分范围和输出集群的最佳大小提供质量阈值来配置停止标准。用户可以尝试不同的停止标准集，并选择最“合理”的质量集群组摘要：This paper presents a multiple learner algorithm called the 'Three Ensemble Clustering 3EC' algorithm that classifies unlabeled data into quality clusters as a part of unsupervised learning. It offers the flexibility to explore the context of new clusters formed by an ensemble of algorithms based on internal validation indices. It is worth mentioning that the input data set is considered to be a cluster of clusters. An anomaly can possibly manifest as a cluster as well. Each partitioned cluster is considered to be a new data set and is a candidate to explore the most optimal algorithm and its number of partition splits until a predefined stopping criteria is met. The algorithms independently partition the data set into clusters and the quality of the partitioning is assessed by an ensemble of internal cluster validation indices. The 3EC algorithm presents the validation index scores from a choice of algorithms and its configuration of partitions and it is called the Tau Grid. 3EC chooses the most optimal score. The 3EC algorithm owes its name to the two input ensembles of algorithms and internal validation indices and an output ensemble of final clusters. Quality plays an important role in this clustering approach and it also acts as a stopping criteria from further partitioning. Quality is determined based on the quality of the clusters provided by an algorithm and its optimal number of splits. The 3EC algorithm determines this from the score of the ensemble of validation indices. The user can configure the stopping criteria by providing quality thresholds for the score range of each of the validation indices and the optimal size of the output cluster. The users can experiment with different sets of stopping criteria and choose the most 'sensible group' of quality clusters

【4】 Proceedings of the First Workshop on Weakly Supervised Learning (WeaSuL) 标题：第一届弱监督学习研讨会(WeaSuL)论文集

作者：Michael A. Hedderich,Benjamin Roth,Katharina Kann,Barbara Plank,Alex Ratner,Dietrich Klakow 链接：https://arxiv.org/abs/2107.03690 摘要：欢迎来到WeaSuL 2021，这是第一个关于弱监督学习的研讨会，与ICLR 2021合办。在这个研讨会上，我们希望提出一些理论、方法和工具，让专家能够表达自动数据注释的先验编码知识，这些知识可以用来训练任意深度的神经网络进行预测。iclr2021弱监督研讨会旨在提出一些方法，帮助现代机器学习方法与可观察（未标记）数据交互，从专家提供的知识中进行归纳。共录取论文15篇。所有接受的捐款都列在这些程序中。摘要：Welcome to WeaSuL 2021, the First Workshop on Weakly Supervised Learning, co-located with ICLR 2021. In this workshop, we want to advance theory, methods and tools for allowing experts to express prior coded knowledge for automatic data annotations that can be used to train arbitrary deep neural networks for prediction. The ICLR 2021 Workshop on Weak Supervision aims at advancing methods that help modern machine-learning methods to generalize from knowledge provided by experts, in interaction with observable (unlabeled) data. In total, 15 papers were accepted. All the accepted contributions are listed in these Proceedings.

【5】 Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases 标题：比较监督模型和学习语音表示对选定短语上的混乱语音的可理解性进行分类

作者：Subhashini Venugopalan,Joel Shor,Manoj Plakal,Jimmy Tobin,Katrin Tomanek,Jordan R. Green,Michael P. Brenner 机构：Google Research,MGH Institute of Health Professions USA,Harvard University USA 备注：Accepted at INTERSPEECH 2021 链接：https://arxiv.org/abs/2107.03985 摘要：语音障碍的自动分类可以为识别语音障碍的存在和严重程度提供一个客观的工具。分类方法也有助于识别难以识别的语音样本，使ASR系统了解受损语音的各种表现形式。在这里，我们发展和比较不同的深度学习技术来分类所选短语的可懂度。我们从661名自述有29个单词或短语的患者中收集了样本，由言语语言病理学家使用五点Likert量表对这些患者的整体可理解性进行了评估。然后我们评估了使用3种方法开发的分类器：（1）为任务训练的卷积神经网络（CNN），（2）使用无监督目标的CNN非语义语音表示训练的分类器[1]，（3）基于典型语音训练的ASR系统中嵌入的声学（编码器）分类器[2]。我们发现，ASR编码器的嵌入在检测和分类无序语音方面明显优于其他两种。进一步的分析表明，ASR嵌入的语音是按口语短语聚类的，而非语义嵌入的语音是按说话人聚类的。此外，较长的短语比单个单词更能说明可理解性缺陷。摘要：Automatic classification of disordered speech can provide an objective tool for identifying the presence and severity of speech impairment. Classification approaches can also help identify hard-to-recognize speech samples to teach ASR systems about the variable manifestations of impaired speech. Here, we develop and compare different deep learning techniques to classify the intelligibility of disordered speech on selected phrases. We collected samples from a diverse set of 661 speakers with a variety of self-reported disorders speaking 29 words or phrases, which were rated by speech-language pathologists for their overall intelligibility using a five-point Likert scale. We then evaluated classifiers developed using 3 approaches: (1) a convolutional neural network (CNN) trained for the task, (2) classifiers trained on non-semantic speech representations from CNNs that used an unsupervised objective [1], and (3) classifiers trained on the acoustic (encoder) embeddings from an ASR system trained on typical speech [2]. We found that the ASR encoder's embeddings considerably outperform the other two on detecting and classifying disordered speech. Further analysis shows that the ASR embeddings cluster speech by the spoken phrase, while the non-semantic embeddings cluster speech by speaker. Also, longer phrases are more indicative of intelligibility deficits than single words.

【6】 Likelihood-Free Frequentist Inference: Bridging Classical Statistics and Machine Learning in Simulation and Uncertainty Quantification 标题：无似然频率推理：在仿真和不确定性量化中架起经典统计和机器学习的桥梁

作者：Niccolò Dalmasso,David Zhao,Rafael Izbicki,Ann B. Lee 机构： Department of Statistics and Data Science, Carnegie Mellon University, Federal University of Sao Carlos 备注：49 pages, 12 figures, code available at this https URL 链接：https://arxiv.org/abs/2107.03920 摘要：许多科学领域广泛使用计算机模拟器，它隐式地为复杂系统编码似然函数。经典的统计方法不太适合这些所谓的无似然推理（LFI）设置，在渐近和低维区域之外。尽管新的机器学习方法，如归一化流，已经彻底改变了LFI方法的样本效率和容量，但它们是否产生可靠的不确定性度量仍然是一个悬而未决的问题。本文提出了一个将经典统计学与现代机器学习相结合的LFI统计框架：（1）构造有限样本覆盖率（I型误差控制）和幂的频数置信集和假设检验，（2）为评估整个参数空间的经验覆盖率提供严格的诊断。我们将我们的框架称为无似然频率推断（LF2I）。任何估计测试统计的方法，比如似然比，都可以插入我们的框架中，以创建具有正确覆盖率的强大测试和置信集。在这项工作中，我们特别研究了两个测试统计量（ACORE和BFF），它们分别在参数空间上最大化和集成优势函数。我们的理论和实证结果提供了多方面的观点，错误来源和挑战的可能性自由频率推断。摘要：Many areas of science make extensive use of computer simulators that implicitly encode likelihood functions for complex systems. Classical statistical methods are poorly suited for these so-called likelihood-free inference (LFI) settings, outside the asymptotic and low-dimensional regimes. Although new machine learning methods, such as normalizing flows, have revolutionized the sample efficiency and capacity of LFI methods, it remains an open question whether they produce reliable measures of uncertainty. In this paper, we present a statistical framework for LFI that unifies classical statistics with modern machine learning to: (1) construct frequentist confidence sets and hypothesis tests with finite-sample guarantees of nominal coverage (type I error control) and power, and (2) provide rigorous diagnostics for assessing empirical coverage over the entire parameter space. We refer to our framework as likelihood-free frequentist inference (LF2I). Any method that estimates a test statistic, such as the likelihood ratio, can be plugged into our framework to create powerful tests and confidence sets with correct coverage. In this work, we specifically study two test statistics (ACORE and BFF), which, respectively, maximize versus integrate an odds function over the parameter space. Our theoretical and empirical results offer multifaceted perspectives on error sources and challenges in likelihood-free frequentist inference.

【7】 Label-set Loss Functions for Partial Supervision: Application to Fetal Brain 3D MRI Parcellation 标题：部分监督的标签集损失函数在胎脑3D MRI分割中的应用

作者：Lucas Fidon,Michael Aertsen,Doaa Emam,Nada Mufti,Frederic Guffens,Thomas Deprest,Philippe Demaerel,Anna L. David,Andrew Melbourne,Sebastien Ourselin,Jam Deprest,Tom Vercauteren 机构： School of Biomedical Engineering & Imaging Sciences, King’s College London, UK, Department of Radiology, University Hospitals Leuven, Belgium, Institute for Women’s Health, University College London, UK 备注：Accepted at MICCAI 2021 链接：https://arxiv.org/abs/2107.03846 摘要：深度神经网络提高了自动分割的精度，但其精度依赖于大量完全分割图像的可用性。为了更好地利用部分标注的数据集，有必要使用分割了部分（而不是全部）感兴趣区域的图像来训练深度神经网络。在本文中，我们提出了第一个公理化定义的标签集损失函数是损失函数，可以处理部分分割的图像。我们证明了将完全分割图像的经典损失函数转化为适当的标签集损失函数的方法是唯一的。我们的理论还允许我们定义叶子骰子损失，这是骰子损失的一个标签集泛化，特别适用于只缺少标签的部分监督。利用叶骰子丢失，我们建立了一个新的国家在部分监督学习胎儿大脑三维磁共振分割。基于解剖正常胎儿或开放性脊柱裂的胎脑三维MRI，我们实现了一个能够分割白质、脑室、小脑、脑室外CSF、皮质灰质、深灰质、脑干和胼胝体的深部神经网络。我们建议的标签集损失函数的实现可在https://github.com/LucasFidon/label-set-loss-functions 摘要：Deep neural networks have increased the accuracy of automatic segmentation, however, their accuracy depends on the availability of a large number of fully segmented images. Methods to train deep neural networks using images for which some, but not all, regions of interest are segmented are necessary to make better use of partially annotated datasets. In this paper, we propose the first axiomatic definition of label-set loss functions that are the loss functions that can handle partially segmented images. We prove that there is one and only one method to convert a classical loss function for fully segmented images into a proper label-set loss function. Our theory also allows us to define the leaf-Dice loss, a label-set generalization of the Dice loss particularly suited for partial supervision with only missing labels. Using the leaf-Dice loss, we set a new state of the art in partially supervised learning for fetal brain 3D MRI segmentation. We achieve a deep neural network able to segment white matter, ventricles, cerebellum, extra-ventricular CSF, cortical gray matter, deep gray matter, brainstem, and corpus callosum based on fetal brain 3D MRI of anatomically normal fetuses or with open spina bifida. Our implementation of the proposed label-set loss functions is available at https://github.com/LucasFidon/label-set-loss-functions

迁移|Zero/Few/One-Shot|自适应(4篇)

【1】 RMA: Rapid Motor Adaptation for Legged Robots 标题：RMA：腿部机器人的快速运动适应

作者：Ashish Kumar,Zipeng Fu,Deepak Pathak,Jitendra Malik 机构：Carnegie Mellon University, UC Berkeley, Facebook 备注：RSS 2021. Webpage at this https URL 链接：https://arxiv.org/abs/2107.04034 摘要：腿型机器人在现实世界中的成功部署需要它们实时适应未知场景，如地形变化、有效载荷变化、磨损等。针对四足机器人的实时在线自适应问题，提出了一种快速运动自适应算法。RMA由两部分组成：基本策略和自适应模块。这些部件的组合使机器人能够在几秒钟内适应新的环境。RMA完全在仿真中训练，不需要参考轨迹或预定义的脚轨迹生成器等领域知识，并且在A1机器人上部署，无需任何微调。我们使用生物能学启发的奖励，在各种地形发生器上训练RMA，并将其部署在各种困难地形上，包括岩石、光滑、可变形表面，以及有草、长植被、混凝土、卵石、楼梯、沙子的环境中，RMA在各种真实世界和模拟实验中展示了最先进的性能。视频结果https://ashish-kmr.github.io/rma-legged-robots/ 摘要：Successful real-world deployment of legged robots would require them to adapt in real-time to unseen scenarios like changing terrains, changing payloads, wear and tear. This paper presents Rapid Motor Adaptation (RMA) algorithm to solve this problem of real-time online adaptation in quadruped robots. RMA consists of two components: a base policy and an adaptation module. The combination of these components enables the robot to adapt to novel situations in fractions of a second. RMA is trained completely in simulation without using any domain knowledge like reference trajectories or predefined foot trajectory generators and is deployed on the A1 robot without any fine-tuning. We train RMA on a varied terrain generator using bioenergetics-inspired rewards and deploy it on a variety of difficult terrains including rocky, slippery, deformable surfaces in environments with grass, long vegetation, concrete, pebbles, stairs, sand, etc. RMA shows state-of-the-art performance across diverse real-world as well as simulation experiments. Video results at https://ashish-kmr.github.io/rma-legged-robots/

【2】 Knowledge Transfer by Discriminative Pre-training for Academic Performance Prediction 标题：用于学业成绩预测的识别性预训练知识转移

作者：Byungsoo Kim,Hangyeol Yu,Dongmin Shin,Youngduck Choi 机构：Riiid! AI Research 备注：EDM 2021 链接：https://arxiv.org/abs/2107.04009 摘要：随着智能教学系统（ITS）越来越受到人们的重视，人们越来越重视对学生学习成绩的精确评估。然而，由于学业成绩的标签（如考试成绩）是从ITS外部收集的，获取标签的成本很高，导致了标签的稀缺性问题，这给采用机器学习方法进行学业成绩预测带来了挑战。为此，受近年来自然语言处理领域预训练方法研究进展的启发，我们提出了DPA，这是一个基于区分性预训练任务的迁移学习框架。DPA预先训练两个模型，一个生成器和一个鉴别器，并对鉴别器的学习成绩预测进行微调。在DPA的预训练阶段，向生成器提供一个交互序列，其中一些令牌被屏蔽，生成器被训练来重构原始序列。然后，鉴别器获取一个交互序列，其中被屏蔽的令牌被生成器的输出所替换，并且被训练来预测序列中所有令牌的原始性。与以往最先进的生成性预训练方法相比，DPA具有更高的样本效率，收敛速度快，学习成绩预测误差小。我们在一个多平台的真实数据集上进行了大量的实验研究，结果表明，DPA方法的平均绝对误差降低了4.05%，并且对增加的标签稀缺性具有更强的鲁棒性。摘要：The needs for precisely estimating a student's academic performance have been emphasized with an increasing amount of attention paid to Intelligent Tutoring System (ITS). However, since labels for academic performance, such as test scores, are collected from outside of ITS, obtaining the labels is costly, leading to label-scarcity problem which brings challenge in taking machine learning approaches for academic performance prediction. To this end, inspired by the recent advancement of pre-training method in natural language processing community, we propose DPA, a transfer learning framework with Discriminative Pre-training tasks for Academic performance prediction. DPA pre-trains two models, a generator and a discriminator, and fine-tunes the discriminator on academic performance prediction. In DPA's pre-training phase, a sequence of interactions where some tokens are masked is provided to the generator which is trained to reconstruct the original sequence. Then, the discriminator takes an interaction sequence where the masked tokens are replaced by the generator's outputs, and is trained to predict the originalities of all tokens in the sequence. Compared to the previous state-of-the-art generative pre-training method, DPA is more sample efficient, leading to fast convergence to lower academic performance prediction error. We conduct extensive experimental studies on a real-world dataset obtained from a multi-platform ITS application and show that DPA outperforms the previous state-of-the-art generative pre-training method with a reduction of 4.05% in mean absolute error and more robust to increased label-scarcity.

【3】 Adaptation of Quadruped Robot Locomotion with Meta-Learning 标题：基于元学习的四足机器人运动适应性研究

作者：Arsen Kuzhamuratov,Dmitry Sorokin,Alexander Ulanov,A. I. Lvovsky 机构：Russian Quantum Center, Moscow, Russia, Moscow Institute of Physics and Technology, Russia, University of Oxford, United Kingdom 备注：14 pages, 6 figures 链接：https://arxiv.org/abs/2107.03741 摘要：动物有非凡的能力来适应不同的地形和任务的移动。然而，通过强化学习训练的机器人通常只能解决单个任务，并且转移策略通常不如从头开始训练的机器人。在这项工作中，我们证明了元强化学习可以用来成功地训练机器人能够解决广泛的运动任务。元训练机器人的性能与单任务训练机器人的性能相似。摘要：Animals have remarkable abilities to adapt locomotion to different terrains and tasks. However, robots trained by means of reinforcement learning are typically able to solve only a single task and a transferred policy is usually inferior to that trained from scratch. In this work, we demonstrate that meta-reinforcement learning can be used to successfully train a robot capable to solve a wide range of locomotion tasks. The performance of the meta-trained robot is similar to that of a robot that is trained on a single task.

【4】 Identification and Adaptation with Binary-Valued Observations under Non-Persistent Excitation Condition 标题：非持续激励条件下二进制值观测值的辨识与适应

作者：Lantian Zhang,Yanlong Zhao,Lei Guo 机构：Key Laboratory of Systems and Control, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, School of Mathematical Science, University of Chinese Academy of Sciences, Beijing , China 备注：11 pages, 4 figures, submitted to Automatica 链接：https://arxiv.org/abs/2107.03588 摘要：具有二值观测的动力系统广泛应用于信息产业、生物制药技术等领域。虽然人们对这类系统的辨识做了大量的工作，但以往的研究大多是基于一阶梯度算法，其收敛速度通常比拟牛顿算法慢得多。此外，现有文献中通常需要激励持续性（PE）条件来保证参数估计的一致性，这对于反馈控制系统是很难验证或保证的。本文提出了一种在线投影拟牛顿型算法，用于具有二值观测值和变阈值的随机回归模型的参数估计。利用随机李雅普诺夫函数和鞅估计方法，建立了估计算法的强相合性，并给出了收敛速度，在比传统的PE条件弱得多的信号条件下，与经典的最小二乘随机回归模型算法已知的最微弱的可能激励相吻合。讨论了自适应预测器的收敛性及其在自适应控制中的应用。摘要：Dynamical systems with binary-valued observations are widely used in information industry, technology of biological pharmacy and other fields. Though there have been much efforts devoted to the identification of such systems, most of the previous investigations are based on first-order gradient algorithm which usually has much slower convergence rate than the Quasi-Newton algorithm. Moreover, persistence of excitation(PE) conditions are usually required to guarantee consistent parameter estimates in the existing literature, which are hard to be verified or guaranteed for feedback control systems. In this paper, we propose an online projected Quasi-Newton type algorithm for parameter estimation of stochastic regression models with binary-valued observations and varying thresholds. By using both the stochastic Lyapunov function and martingale estimation methods, we establish the strong consistency of the estimation algorithm and provide the convergence rate, under a signal condition which is considerably weaker than the traditional PE condition and coincides with the weakest possible excitation known for the classical least square algorithm of stochastic regression models. Convergence of adaptive predictors and their applications in adaptive control are also discussed.

强化学习(1篇)

【1】 CamTuner: Reinforcement-Learning based System for Camera Parameter Tuning to enhance Analytics 标题：CamTuner：基于强化学习的摄像机参数调整增强分析系统

作者：Sibendu Paul,Kunal Rao,Giuseppe Coviello,Murugan Sankaradas,Oliver Po,Y. Charlie Hu,Srimat T. Chakradhar 机构：Purdue University, West Lafayette, USA, NEC Laboratories America, New Jersey, USA 链接：https://arxiv.org/abs/2107.03964 摘要：像摄像机这样的复杂传感器包括几十个可配置的参数，最终用户可以设置这些参数来定制传感器以适应特定的应用场景。尽管参数设置会显著影响传感器输出的质量和从传感器数据得出的见解的准确性，但大多数最终用户使用固定的参数设置，因为他们缺乏适当配置这些参数的技能或理解。我们提出了CamTuner，这是一个系统，自动，动态适应复杂的传感器，以适应不断变化的环境。CamTuner包括两个关键部件。首先，一个定制的分析质量估计器，它是一个深度学习模型，可以随着传感器周围环境的变化，自动地、连续地估计分析单元的洞察质量。第二，强化学习（RL）模块，它对质量的变化作出反应，并自动调整相机参数以提高洞察的准确性。我们通过设计虚拟模型来模拟摄像机的基本行为，从而将RL模块的训练时间提高了一个数量级：我们设计了可以设置为不同值的虚拟旋钮来模拟为摄像机的可配置参数分配不同值的效果，我们设计了一个虚拟摄像机模型，模拟一天中不同时间摄像机的输出。这些虚拟模型大大加快了训练速度，因为（a）真实相机的帧速率限制在25-30fps，而虚拟模型的处理速度为300fps，（b）我们不必等到真实相机看到不同的环境，这可能需要数周或数月，（c）虚拟旋钮可以立即更新，而更改相机参数设置可能需要200-500毫秒。我们的动态调整方法可使多个视频分析任务的洞察精度提高12%。摘要：Complex sensors like video cameras include tens of configurable parameters, which can be set by end-users to customize the sensors to specific application scenarios. Although parameter settings significantly affect the quality of the sensor output and the accuracy of insights derived from sensor data, most end-users use a fixed parameter setting because they lack the skill or understanding to appropriately configure these parameters. We propose CamTuner, which is a system to automatically, and dynamically adapt the complex sensor to changing environments. CamTuner includes two key components. First, a bespoke analytics quality estimator, which is a deep-learning model to automatically and continuously estimate the quality of insights from an analytics unit as the environment around a sensor change. Second, a reinforcement learning (RL) module, which reacts to the changes in quality, and automatically adjusts the camera parameters to enhance the accuracy of insights. We improve the training time of the RL module by an order of magnitude by designing virtual models to mimic essential behavior of the camera: we design virtual knobs that can be set to different values to mimic the effects of assigning different values to the camera's configurable parameters, and we design a virtual camera model that mimics the output from a video camera at different times of the day. These virtual models significantly accelerate training because (a) frame rates from a real camera are limited to 25-30 fps while the virtual models enable processing at 300 fps, (b) we do not have to wait until the real camera sees different environments, which could take weeks or months, and (c) virtual knobs can be updated instantly, while it can take 200-500 ms to change the camera parameter settings. Our dynamic tuning approach results in up to 12% improvement in the accuracy of insights from several video analytics tasks.

医学相关(2篇)

【1】 Predicting Disease Progress with Imprecise Lab Test Results 标题：用不精确的实验室检测结果预测疾病进展

作者：Mei Wang,Jianwen Su,Zhihua Lin 机构：Donghua University, China, Univ of California, Santa Barbara, USA 链接：https://arxiv.org/abs/2107.03620 摘要：在现有的深度学习方法中，几乎所有的损失函数都假设被预测的样本数据值是唯一正确的。这种假设不适用于实验室测试数据。测试结果通常在可接受或不精确的范围内，所有值都在可接受的范围内。在考虑不精确样本的基础上，提出了一种不精确距离损失（IR-loss）方法，并将其应用于疾病进展预测的长短时记忆（LSTM）模型中。在该方法中，不精确距离空间中的每个样本都有一定的概率成为真实值，参与损失计算。损失定义为压痕范围空间中各点误差的积分。提出了一种不精确空间的抽样方法。对连续不精确空间进行离散化，得到一系列不精确数据集，便于梯度下降学习。提出了一种基于不精确数据集的启发式学习算法来学习模型参数。对实际数据的实验结果表明，基于红外损耗的预测方法在不精确范围内生成测试样本时，能提供更稳定、一致的预测结果。摘要：In existing deep learning methods, almost all loss functions assume that sample data values used to be predicted are the only correct ones. This assumption does not hold for laboratory test data. Test results are often within tolerable or imprecision ranges, with all values in the ranges acceptable. By considering imprecision samples, we propose an imprecision range loss (IR loss) method and incorporate it into Long Short Term Memory (LSTM) model for disease progress prediction. In this method, each sample in imprecision range space has a certain probability to be the real value, participating in the loss calculation. The loss is defined as the integral of the error of each point in the impression range space. A sampling method for imprecision space is formulated. The continuous imprecision space is discretized, and a sequence of imprecise data sets are obtained, which is convenient for gradient descent learning. A heuristic learning algorithm is developed to learn the model parameters based on the imprecise data sets. Experimental results on real data show that the prediction method based on IR loss can provide more stable and consistent prediction result when test samples are generated from imprecision range.

【2】 Federated Learning for Multi-Center Imaging Diagnostics: A Study in Cardiovascular Disease 标题：多中心影像诊断的联合学习：一项心血管疾病的研究

作者：Akis Linardos,Kaisar Kushibar,Sean Walsh,Polyxeni Gkontra,Karim Lekadir 机构：University of Barcelona, Department of Mathematics and Computer Science, Barcelona, Spain, Radiomics, Liege, Belgium 备注：Code used in this study can be found in: this https URL 链接：https://arxiv.org/abs/2107.03901 摘要：深度学习模型可以实现准确和高效的疾病诊断，但迄今为止一直受到医学界数据匮乏的阻碍。自动化诊断研究一直受到动力不足的单中心数据集的限制，尽管一些结果显示出了希望，但由于没有考虑机构间的数据异质性，它们对其他机构的普遍性仍然值得怀疑。通过允许模型以一种分布式的方式进行训练，从而保护患者的隐私，联邦学习通过支持勤奋的多中心研究，有望缓解这些问题。我们提出了第一个关于心血管磁共振（CMR）模式的联合学习研究，并使用来自M\&M和ACDC数据集子集的四个中心，集中于肥厚型心肌病（HCM）的诊断。我们采用一个预先训练动作识别的3D-CNN网络，探索了两种将形状先验信息整合到模型中的方法，以及四种不同的数据增强设置，系统地分析了它们对不同协作学习选择的影响。我们发现，尽管数据量很小（来自四个中心的180名受试者），隐私保护的联合学习仍然取得了与传统集中式学习相竞争的好结果。我们进一步发现，联邦训练模型表现出更强的鲁棒性和更敏感的领域转移的影响。摘要：Deep learning models can enable accurate and efficient disease diagnosis, but have thus far been hampered by the data scarcity present in the medical world. Automated diagnosis studies have been constrained by underpowered single-center datasets, and although some results have shown promise, their generalizability to other institutions remains questionable as the data heterogeneity between institutions is not taken into account. By allowing models to be trained in a distributed manner that preserves patients' privacy, federated learning promises to alleviate these issues, by enabling diligent multi-center studies. We present the first federated learning study on the modality of cardiovascular magnetic resonance (CMR) and use four centers derived from subsets of the M\&M and ACDC datasets, focusing on the diagnosis of hypertrophic cardiomyopathy (HCM). We adapt a 3D-CNN network pretrained on action recognition and explore two different ways of incorporating shape prior information to the model, and four different data augmentation set-ups, systematically analyzing their impact on the different collaborative learning choices. We show that despite the small size of data (180 subjects derived from four centers), the privacy preserving federated learning achieves promising results that are competitive with traditional centralized learning. We further find that federatively trained models exhibit increased robustness and are more sensitive to domain shift effects.

联邦学习|隐私保护|加密(3篇)

【1】 Management of Resource at the Network Edge for Federated Learning 标题：面向联合学习的网络边缘资源管理

作者：Silvana Trindade,Luiz F. Bittencourt,Nelson L. S. da Fonseca 机构：Institute of Computing, State University of Campinas, Campinas, Brazil 备注：arXiv admin note: text overlap with arXiv:1803.05255 by other authors 链接：https://arxiv.org/abs/2107.03428 摘要：联邦学习被认为是边缘训练的一个很有前途的解决方案，终端设备在不与其他实体共享数据的情况下协作训练模型。由于这些学习模型的执行发生在资源有限的边缘，因此必须开发新的解决方案。在本文中，我们描述了边缘资源管理的最新工作，并探讨了在边缘执行联合学习所面临的挑战和未来的发展方向。本文将讨论这种管理的一些问题，如资源发现、部署、负载平衡、迁移和能源效率。摘要：Federated learning has been explored as a promising solution for training at the edge, where end devices collaborate to train models without sharing data with other entities. Since the execution of these learning models occurs at the edge, where resources are limited, new solutions must be developed. In this paper, we describe the recent work on resource management at the edge, and explore the challenges and future directions to allow the execution of federated learning at the edge. Some of the problems of this management, such as discovery of resources, deployment, load balancing, migration, and energy efficiency will be discussed in the paper.

【2】 Federated Learning as a Mean-Field Game 标题：作为平均场博弈的联合学习

作者：Arash Mehrjou 机构：ETH Z¨urich, &, Max Planck Institute for Intelligent Systems 链接：https://arxiv.org/abs/2107.03770 摘要：我们建立了联邦学习（机器学习的概念）和平均场对策（博弈论和控制论的概念）之间的联系。在这个类比中，本地联合学习者被视为参与者，梯度在中心服务器中的聚集就是平均场效应。我们将联邦学习看作一个微分对策，并讨论了这个对策均衡的性质。我们希望这种新颖的联邦学习观点能将来自这两个不同领域的研究人员聚集在一起，共同研究大规模分布式和隐私保护学习算法的基本问题。摘要：We establish a connection between federated learning, a concept from machine learning, and mean-field games, a concept from game theory and control theory. In this analogy, the local federated learners are considered as the players and the aggregation of the gradients in a central server is the mean-field effect. We present federated learning as a differential game and discuss the properties of the equilibrium of this game. We hope this novel view to federated learning brings together researchers from these two distinct areas to work on fundamental problems of large-scale distributed and privacy-preserving learning algorithms.

【3】 Energy Efficient Federated Learning in Integrated Fog-Cloud Computing Enabled Internet-of-Things Networks 标题：集成雾云计算支持的物联网网络中的高能效联合学习

作者：Mohammed S. Al-Abiad,Md. Zoheb Hassan,Md. Jahangir Hossain 机构：Hossain, Senior Member, IEEE 备注：30, 10, article 链接：https://arxiv.org/abs/2107.03520 摘要：本文研究了在基于云计算的物联网（IoT）网络中降低联邦学习（FL）能耗的资源分配方案。在设想的系统中，物联网设备通过多个fog接入点（F-AP）与集中式云服务器（CS）连接。我们考虑两种不同的情景训练局部模型。在第一个场景中，在物联网设备上训练本地模型，并且F-ap将本地模型参数上传到CS。在第二种场景中，基于从物联网设备收集的数据在F-ap处训练本地模型，并且F-ap与CS协作以更新模型参数。我们的目标是在FL时间限制下最小化两种情况下的总体能耗。针对这一目标，我们设计了一个物联网设备与F-APs调度、发射功率分配、设备和F-APs计算频率分配的联合优化问题，并将其分解为两个子问题。在第一个子问题中，我们优化了物联网设备调度和功率分配，而在第二个子问题中，我们优化了计算频率分配。对于每种情况，我们都开发了一个基于冲突图的解决方案来迭代地解决这两个子问题。仿真结果表明，所提出的两种方案在能量消耗最小化方面取得了可观的性能增益。仿真结果有趣地表明，对于大量的物联网设备和大数据量，在物联网设备上训练本地模型比在F-ap上训练本地模型更节能。摘要：We investigate resource allocation scheme to reduce the energy consumption of federated learning (FL) in the integrated fog-cloud computing enabled Internet-of-things (IoT) networks. In the envisioned system, IoT devices are connected with the centralized cloud server (CS) via multiple fog access points (F-APs). We consider two different scenarios for training the local models. In the first scenario, local models are trained at the IoT devices and the F-APs upload the local model parameters to the CS. In the second scenario, local models are trained at the F-APs based on the collected data from the IoT devices and the F-APs collaborate with the CS for updating the model parameters. Our objective is to minimize the overall energy-consumption of both scenarios subject to FL time constraint. Towards this goal, we devise a joint optimization of scheduling of IoT devices with the F-APs, transmit power allocation, computation frequency allocation at the devices and F-APs and decouple it into two subproblems. In the first subproblem, we optimize the IoT device scheduling and power allocation, while in the second subproblem, we optimize the computation frequency allocation. For each scenario, we develop a conflict graph based solution to iteratively solve the two subproblems. Simulation results show that the proposed two schemes achieve a considerable performance gain in terms of the energy consumption minimization. The presented simulation results interestingly reveal that for a large number of IoT devices and large data sizes, it is more energy efficient to train the local models at the IoT devices instead of the F-APs.

推理|分析|理解|解释(6篇)

【1】 Manifold Hypothesis in Data Analysis: Double Geometrically-Probabilistic Approach to Manifold Dimension Estimation 标题：数据分析中的流形假设：流形维数估计的双重几何-概率方法

作者：Alexander Ivanov,Gleb Nosovskiy,Alexey Chekunov,Denis Fedoseev,Vladislav Kibkalo,Mikhail Nikulin,Fedor Popelenskiy,Stepan Komkov,Ivan Mazurenko,Aleksandr Petiushko 链接：https://arxiv.org/abs/2107.03903 摘要：流形假说认为，高维空间中的数据点实际上位于低维流形的附近。在许多情况下，这一假设得到了实证验证，并用于增强无监督和半监督学习。在这里，我们提出了新的方法来流形假设检查和潜在的流形维数估计。为了做到这一点，我们同时使用两种截然不同的方法-一种是几何方法，另一种是概率方法-并检查它们是否给出相同的结果。我们的几何方法是对Minkowski维数计算中著名的盒计数算法的稀疏数据的一种改进。概率方法是一种新方法。虽然它利用标准的最近邻域距离，但它不同于以前在这种情况下使用的方法。该方法鲁棒性强，速度快，并包含特殊的初始数据转换。在实际数据集上的实验表明，基于两种方法结合的方法是有效的。摘要：Manifold hypothesis states that data points in high-dimensional space actually lie in close vicinity of a manifold of much lower dimension. In many cases this hypothesis was empirically verified and used to enhance unsupervised and semi-supervised learning. Here we present new approach to manifold hypothesis checking and underlying manifold dimension estimation. In order to do it we use two very different methods simultaneously - one geometric, another probabilistic - and check whether they give the same result. Our geometrical method is a modification for sparse data of a well-known box-counting algorithm for Minkowski dimension calculation. The probabilistic method is new. Although it exploits standard nearest neighborhood distance, it is different from methods which were previously used in such situations. This method is robust, fast and includes special preliminary data transformation. Experiments on real datasets show that the suggested approach based on two methods combination is powerful and effective.

【2】 Explainable AI (XAI) for PHM of Industrial Asset: A State-of-The-Art, PRISMA-Compliant Systematic Review 标题：工业资产PHM的可解释性人工智能(XAI)：符合PRISMA的最新系统评价

作者：Ahmad Kamal BIN MOHD NOR,Srinivasa Rao PEDAPATI,Masdi MUHAMMAD 机构：Mechanical Department, Universiti Teknologi Petronas, Malaysia. 链接：https://arxiv.org/abs/2107.03869 摘要：综述了XAI在工业资产预测与健康管理中的应用现状。本文试图概述PHM中XAI的发展趋势，回答准确性与可解释性的问题，探讨人在PHM-XAI中的作用范围、可解释性评价和不确定性管理。2015年至2021年与PHM XAI相关的英文研究文章选自IEEE Xplore、ScienceDirect、SpringerLink、ACM数字图书馆和Scopus数据库，使用PRISMA指南。从35篇文章中提取数据，并用MS.Excel进行检查。综合了几个发现。首先，在这门学科还很年轻的时候，分析表明XAI在PHM领域的接受度越来越高。其次，XAI是一把双刃剑，在这里它被当作执行PHM任务的工具和解释的手段，特别是在诊断和异常检测方面。因此，PHM中需要XAI。第三，回顾表明PHM-XAI论文总体上产生了好的或优秀的结果，表明PHM性能不受XAI的影响。第四，人的角色、可解释性度量和不确定性管理是PHM界需要进一步关注的领域。迫切需要足够的可解释性指标来满足PHM的需要。最后，大多数被接受的文章中的案例研究都是基于真实的，这表明可用的AI和XAI方法能够解决复杂的现实世界挑战，增加了业界采用AI模型的信心。这项工作是由大学TekOrni Petri基金会资助的。摘要：A state-of-the-art systematic review on XAI applied to Prognostic and Health Management (PHM) of industrial asset is presented. The work attempts to provide an overview of the general trend of XAI in PHM, answers the question of accuracy versus explainability, investigates the extent of human role, explainability evaluation and uncertainty management in PHM XAI. Research articles linked to PHM XAI, in English language, from 2015 to 2021 are selected from IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library and Scopus databases using PRISMA guidelines. Data was extracted from 35 selected articles and examined using MS. Excel. Several findings were synthesized. Firstly, while the discipline is still young, the analysis indicates the growing acceptance of XAI in PHM domain. Secondly, XAI functions as a double edge sword, where it is assimilated as a tool to execute PHM tasks as well as a mean of explanation, in particular in diagnostic and anomaly detection. There is thus a need for XAI in PHM. Thirdly, the review shows that PHM XAI papers produce either good or excellent results in general, suggesting that PHM performance is unaffected by XAI. Fourthly, human role, explainability metrics and uncertainty management are areas requiring further attention by the PHM community. Adequate explainability metrics to cater for PHM need are urgently needed. Finally, most case study featured on the accepted articles are based on real, indicating that available AI and XAI approaches are equipped to solve complex real-world challenges, increasing the confidence of AI model adoption in the industry. This work is funded by the Universiti Teknologi Petronas Foundation.

【3】 Analytically Tractable Hidden-States Inference in Bayesian Neural Networks 标题：贝叶斯神经网络中可解析处理的隐态推理

作者：Luong-Ha Nguyen,James-A. Goulet 机构：Department of Civil, Geologic and Mining Engineering, Polytechnique Montréal, CANADA 备注：37 pages, 13 figures 链接：https://arxiv.org/abs/2107.03759 摘要：除了少数例外，神经网络一直依赖反向传播和梯度下降作为推理机来学习模型参数，因为神经网络的封闭形式贝叶斯推理一直被认为是难以解决的。在本文中，我们展示了如何利用可处理的近似高斯推理（TAGI）的能力来推断隐藏状态，而不是仅仅用它来推断网络的参数。它允许的一个新的方面是通过施加为实现特定目标而设计的约束来推断隐藏状态，如三个示例所示：（1）生成对抗性攻击示例，（2）使用神经网络作为黑盒优化方法，推理在连续动作强化学习中的应用。这些应用程序展示了以前保留给基于梯度的优化方法的任务现在如何通过分析可处理的推理来处理摘要：With few exceptions, neural networks have been relying on backpropagation and gradient descent as the inference engine in order to learn the model parameters, because the closed-form Bayesian inference for neural networks has been considered to be intractable. In this paper, we show how we can leverage the tractable approximate Gaussian inference's (TAGI) capabilities to infer hidden states, rather than only using it for inferring the network's parameters. One novel aspect it allows is to infer hidden states through the imposition of constraints designed to achieve specific objectives, as illustrated through three examples: (1) the generation of adversarial-attack examples, (2) the usage of a neural network as a black-box optimization method, and (3) the application of inference on continuous-action reinforcement learning. These applications showcase how tasks that were previously reserved to gradient-based optimization approaches can now be approached with analytically tractable inference

【4】 In-Network Learning: Distributed Training and Inference in Networks 标题：网络内学习：网络中的分布式训练和推理

作者：Matei Moldoveanu,Abdellatif Zaidi 机构：† Universit´e Paris-Est, Champs-sur-Marne , France, ∤ Mathematical and Algorithmic Sciences Lab., Paris Research Center, Huawei France 备注：Submitted to the IEEE Journal on Selected Areas in Communications (JSAC) Series on Machine Learning for Communications and Networks. arXiv admin note: substantial text overlap with arXiv:2104.14929 链接：https://arxiv.org/abs/2107.03433 摘要：人们普遍认为，将现代机器学习技术的成功应用于移动设备和无线网络，有可能实现重要的新服务。然而，这带来了巨大的挑战，主要是由于数据和处理能力在无线网络中高度分布。在本文中，我们开发了一个学习算法和一个体系结构，它不仅在训练阶段，而且在推理阶段使用多个数据流和处理单元。特别地，分析揭示了推理是如何在网络中传播和融合的。研究了该方法的设计准则和带宽要求。此外，我们还讨论了神经网络在典型无线接入系统中的应用；并提供实验，说明相对于最先进的技术的好处。摘要：It is widely perceived that leveraging the success of modern machine learning techniques to mobile devices and wireless networks has the potential of enabling important new services. This, however, poses significant challenges, essentially due to that both data and processing power are highly distributed in a wireless network. In this paper, we develop a learning algorithm and an architecture that make use of multiple data streams and processing units, not only during the training phase but also during the inference phase. In particular, the analysis reveals how inference propagates and fuses across a network. We study the design criterion of our proposed method and its bandwidth requirements. Also, we discuss implementation aspects using neural networks in typical wireless radio access; and provide experiments that illustrate benefits over state-of-the-art techniques.

【5】 Recurrence-Aware Long-Term Cognitive Network for Explainable Pattern Classification 标题：可解释模式分类的递归感知长期认知网络

作者：Gonzalo Nápoles,Yamisleydi Salgueiro,Isel Grau,Maikel Leon Espinosa 机构： Department of Computer Science, Universidad de Talca, Vrije Universiteit Brussel 链接：https://arxiv.org/abs/2107.03423 摘要：模式分类问题的机器学习解决方案在当今社会和工业中得到了广泛的应用。然而，最准确的模型缺乏透明度和问责制，往往妨碍了它们的有效和安全使用。因此，显然有必要开发可解释的人工智能机制。有一些模型不可知的方法总结了特征贡献，但它们的可解释性仅限于黑箱模型所作的特定预测。一个公开的挑战是开发具有内在可解释性的模型并产生自己的解释，即使对于传统上被认为是黑匣子（递归）神经网络的模型类别也是如此。本文提出了一种基于LTCN的结构化数据可解释模式分类模型。我们的方法通过量化决策过程中每个特征的相关性来提供自己的解释机制。为了在不影响性能的前提下支持解释性，该模型通过准非线性推理规则引入了更大的灵活性，允许控制非线性。此外，我们提出了一个递归感知决策模型，该模型避免了唯一不动点带来的问题，同时引入确定性学习方法来计算可学习参数。仿真结果表明，与最新的白盒和黑盒相比，我们的可解释模型具有更好的性能。摘要：Machine learning solutions for pattern classification problems are nowadays widely deployed in society and industry. However, the lack of transparency and accountability of most accurate models often hinders their meaningful and safe use. Thus, there is a clear need for developing explainable artificial intelligence mechanisms. There exist model-agnostic methods that summarize feature contributions, but their interpretability is limited to specific predictions made by black-box models. An open challenge is to develop models that have intrinsic interpretability and produce their own explanations, even for classes of models that are traditionally considered black boxes like (recurrent) neural networks. In this paper, we propose an LTCN-based model for interpretable pattern classification of structured data. Our method brings its own mechanism for providing explanations by quantifying the relevance of each feature in the decision process. For supporting the interpretability without affecting the performance, the model incorporates more flexibility through a quasi-nonlinear reasoning rule that allows controlling nonlinearity. Besides, we propose a recurrence-aware decision model that evades the issues posed by unique fixed points while introducing a deterministic learning method to compute the learnable parameters. The simulations show that our interpretable model obtains competitive performance when compared to the state-of-the-art white and black boxes.

【6】 Encoding Domain Information with Sparse Priors for Inferring Explainable Latent Variables 标题：利用稀疏先验对领域信息进行编码以推断可解释的潜变量

作者：Arber Qoku,Florian Buettner 机构：German Cancer Consortium (DKTK), German Cancer Research Center (DKFZ), Germany, Goethe University Frankfurt 备注：5 pages, 6 figures, Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare 链接：https://arxiv.org/abs/2107.03730 摘要：潜变量模型是一种强大的统计工具，通过从可观察的高维数据中推断出未观察到的隐藏状态，可以揭示患者或细胞之间的相关变化。然而，现有方法的一个主要缺点是无法学习稀疏且可解释的隐藏状态。此外，在数据的潜在结构的部分知识容易获得的情况下，将先验信息在统计上合理地集成到当前方法中是具有挑战性的。为了解决这些问题，我们提出了spex-LVM，这是一个具有稀疏先验的阶乘潜变量模型，以鼓励由领域相关信息驱动的可解释因素的推断。spex-LVM利用现有的生物医学路径知识，自动将注释属性分配给潜在因素，产生可解释的结果，并根据相应的兴趣领域进行定制。对模拟和真实的单细胞RNA-seq数据集的评估表明，我们的模型以一种内在可解释的方式有力地识别了相关结构，区分了技术噪声和生物医学变异源，并提供了对现有路径注释的数据集特定的适配。可在https://github.com/MLO-lab/spexlvm. 摘要：Latent variable models are powerful statistical tools that can uncover relevant variation between patients or cells, by inferring unobserved hidden states from observable high-dimensional data. A major shortcoming of current methods, however, is their inability to learn sparse and interpretable hidden states. Additionally, in settings where partial knowledge on the latent structure of the data is readily available, a statistically sound integration of prior information into current methods is challenging. To address these issues, we propose spex-LVM, a factorial latent variable model with sparse priors to encourage the inference of explainable factors driven by domain-relevant information. spex-LVM utilizes existing knowledge of curated biomedical pathways to automatically assign annotated attributes to latent factors, yielding interpretable results tailored to the corresponding domain of interest. Evaluations on simulated and real single-cell RNA-seq datasets demonstrate that our model robustly identifies relevant structure in an inherently explainable manner, distinguishes technical noise from sources of biomedical variation, and provides dataset-specific adaptations of existing pathway annotations. Implementation is available at https://github.com/MLO-lab/spexlvm.

检测相关(5篇)

【1】 Multi-Modality Task Cascade for 3D Object Detection 标题：基于多模态任务级联的三维目标检测

作者：Jinhyung Park,Xinshuo Weng,Yunze Man,Kris Kitani 机构：Carnegie Mellon University 链接：https://arxiv.org/abs/2107.04013 摘要：点云和RGB图像自然是3D视觉理解的互补模式——前者提供了稀疏但精确的对象上点的位置，而后者包含了密集的颜色和纹理信息。尽管这种潜在的密切传感器融合，许多方法训练两个模型在隔离和使用简单的特征拼接来表示三维传感器数据。这种分离的训练方案会导致潜在的次优性能，并阻止3D任务被用来帮助2D任务，而2D任务本身通常是有用的。为了提供一种更为综合的方法，我们提出了一种新的多模态任务级联网络（MTC-RCNN），该网络利用3D盒子方案来改进2D分割预测，然后使用3D盒子进一步细化3D盒子。我们发现在两个阶段的3D模块之间加入2D网络可以显著提高2D和3D任务的性能。此外，为了防止3D模块过度依赖于过度拟合的2D预测，我们提出了一种双头2D分割训练和推理方案，允许第二个3D模块学习解释不完美的2D分割预测。在具有挑战性的SUN RGB-D数据集上评估我们的模型，我们大大改进了单模态和融合网络的最新结果（$\textbf{+3.8}$mAP@0.5). 代码将被释放$\href{https://github.com/Divadi/MTC_RCNN}{\text{这里。}}$ 摘要：Point clouds and RGB images are naturally complementary modalities for 3D visual understanding - the former provides sparse but accurate locations of points on objects, while the latter contains dense color and texture information. Despite this potential for close sensor fusion, many methods train two models in isolation and use simple feature concatenation to represent 3D sensor data. This separated training scheme results in potentially sub-optimal performance and prevents 3D tasks from being used to benefit 2D tasks that are often useful on their own. To provide a more integrated approach, we propose a novel Multi-Modality Task Cascade network (MTC-RCNN) that leverages 3D box proposals to improve 2D segmentation predictions, which are then used to further refine the 3D boxes. We show that including a 2D network between two stages of 3D modules significantly improves both 2D and 3D task performance. Moreover, to prevent the 3D module from over-relying on the overfitted 2D predictions, we propose a dual-head 2D segmentation training and inference scheme, allowing the 2nd 3D module to learn to interpret imperfect 2D segmentation predictions. Evaluating our model on the challenging SUN RGB-D dataset, we improve upon state-of-the-art results of both single modality and fusion networks by a large margin ($\textbf{+3.8}$ mAP@0.5). Code will be released $\href{https://github.com/Divadi/MTC_RCNN}{\text{here.}}$

【2】 Optimizing Data Processing in Space for Object Detection in Satellite Imagery 标题：卫星图像目标检测中空间数据处理的优化

作者：Martina Lofqvist,José Cano 机构：School of Computing Science, University of Glasgow, Lilybank Gardens, Glasgow, United Kingdom; +, (,) 备注：Published as a workshop paper at SmallSat 2021 - The 35th Annual Small Satellite Conference. 9 pages, 10 figures. arXiv admin note: text overlap with arXiv:2007.11089 链接：https://arxiv.org/abs/2107.03774 摘要：每年发射的卫星数量激增，导致每天下行数TB的数据。地面站接收到的数据通常是未经处理的，考虑到数据量很大，而且并非所有数据都有用，因此这是一个昂贵的过程。这一点，加上对实时数据处理日益增长的需求，导致对在轨处理解决方案的需求日益增长。在这项工作中，我们通过对卫星数据应用不同的图像压缩技术来研究基于CNN的目标检测器在受限设备上的性能。我们检查Nvidia Jethon Nano和英伟达JETSON AGX沙维尔的能力；低功耗、高性能计算机，集成了gpu，体积小到可以装上纳米卫星。我们仔细研究了目标检测网络，包括单次激发多盒检测器（SSD）和基于区域的完全卷积网络（R-FCN）模型，这些模型是在DOTA上预先训练的，DOTA是一种用于航空图像中目标检测的大规模数据集。性能以执行时间、内存消耗和准确性来衡量，并与包含两个强大gpu的服务器的基线进行比较。结果表明，通过应用图像压缩技术，我们可以提高执行时间和内存消耗，实现完全可运行的数据集。无损压缩技术实现了大约10%的执行时间减少和大约3%的内存消耗减少，而不影响精度。而有损压缩技术将执行时间提高了144%，内存消耗减少了97%。但是，它对精度有很大的影响，这取决于压缩比。因此，这些压缩技术的应用和比率可能根据特定任务所需的精度水平而不同。摘要：There is a proliferation in the number of satellites launched each year, resulting in downlinking of terabytes of data each day. The data received by ground stations is often unprocessed, making this an expensive process considering the large data sizes and that not all of the data is useful. This, coupled with the increasing demand for real-time data processing, has led to a growing need for on-orbit processing solutions. In this work, we investigate the performance of CNN-based object detectors on constrained devices by applying different image compression techniques to satellite data. We examine the capabilities of the NVIDIA Jetson Nano and NVIDIA Jetson AGX Xavier; low-power, high-performance computers, with integrated GPUs, small enough to fit on-board a nanosatellite. We take a closer look at object detection networks, including the Single Shot MultiBox Detector (SSD) and Region-based Fully Convolutional Network (R-FCN) models that are pre-trained on DOTA - a Large Scale Dataset for Object Detection in Aerial Images. The performance is measured in terms of execution time, memory consumption, and accuracy, and are compared against a baseline containing a server with two powerful GPUs. The results show that by applying image compression techniques, we are able to improve the execution time and memory consumption, achieving a fully runnable dataset. A lossless compression technique achieves roughly a 10% reduction in execution time and about a 3% reduction in memory consumption, with no impact on the accuracy. While a lossy compression technique improves the execution time by up to 144% and the memory consumption is reduced by as much as 97%. However, it has a significant impact on accuracy, varying depending on the compression ratio. Thus the application and ratio of these compression techniques may differ depending on the required level of accuracy for a particular task.

【3】 Direct detection of plasticity onset through total-strain profile evolution 标题：通过总应变剖面演化直接检测塑性起始

作者：Stefanos Papanikolaou,Mikko J. Alava 机构：NOMATEN Centre of Excellence, National Centre of Nuclear Research, A. Soltana ,-, Otwock–´Swierk, Poland., Department of Applied Physics, Aalto University, P.O. Box , FI-, Aalto, Espoo, Finland. 备注：9 pages, 6 figures 链接：https://arxiv.org/abs/2107.03738 摘要：固体中的塑性屈服强烈地依赖于各种条件，例如温度和加载速率，事实上，结构材料屈服点的样本相关知识提高了力学行为的可靠性。通常，屈服是通过小尺度或大尺度的受控机械试验来测量的，其方法要么是将弹性（应力）与总变形测量区分开来，要么是识别塑性滑移贡献。在本文中，我们认为，通过数字图像相关，可以通过对现场测量剖面演化序列中总应变波动的统计分析，而不是单独的弹性/塑性测量，来揭示屈服。我们展示了在广泛应用的晶体塑性模型中，通过主成分分析或离散小波变换，精确量化多晶固体屈服点的两种不同方法。我们通过改变加载速率和应变率敏感指数，在多晶体模拟的合成数据和各种屈服响应中测试和比较了这些方法。摘要：Plastic yielding in solids strongly depends on various conditions, such as temperature and loading rate and indeed, sample-dependent knowledge of yield points in structural materials promotes reliability in mechanical behavior. Commonly, yielding is measured through controlled mechanical testing at small or large scales, in ways that either distinguish elastic (stress) from total deformation measurements, or by identifying plastic slip contributions. In this paper we argue that instead of separate elastic/plastic measurements, yielding can be unraveled through statistical analysis of total strain fluctuations during the evolution sequence of profiles measured in-situ, through digital image correlation. We demonstrate two distinct ways of precisely quantifying yield locations in widely applicable crystal plasticity models, that apply in polycrystalline solids, either by using principal component analysis or discrete wavelet transforms. We test and compare these approaches in synthetic data of polycrystal simulations and a variety of yielding responses, through changes of the applied loading rates and the strain-rate sensitivity exponents.

【4】 SpecGrav -- Detection of Gravitational Waves using Deep Learning 标题：SpecGrav--基于深度学习的引力波探测

作者：Hrithika Dodia,Himanshu Tandel,Lynette D'Mello 机构：SpecGrav – Detection of Gravitational Waves, using Deep Learning, Mentor: Lynette D’Mello, Affiliation of all authors and mentor: Dwarkadas J. Sanghvi College of Engineering, Vile Parle, Mumbai, Maharashtra, India 链接：https://arxiv.org/abs/2107.03607 摘要：引力波是时空结构中以光速传播的波纹。LIGO探测引力波是天文学领域的重大突破。深度学习已经彻底改变了许多行业，包括医疗、金融和教育。为了克服传统匹配滤波方法的缺点，人们还探索了用于引力波探测的深度学习技术。然而，在一些研究中，神经网络的训练阶段非常耗时，并且需要大内存的硬件设备来完成这项任务。为了减少训练用于探测引力波的神经网络所需的大量硬件资源和时间，我们制作了SpecGrav。利用二维卷积神经网络和嵌入噪声中的引力波谱图，对双星黑洞合并和中子星合并产生的引力波进行了探测。在2GB GPU上，我们的神经网络的训练阶段大约只有19分钟。摘要：Gravitational waves are ripples in the fabric of space-time that travel at the speed of light. The detection of gravitational waves by LIGO is a major breakthrough in the field of astronomy. Deep Learning has revolutionized many industries including health care, finance and education. Deep Learning techniques have also been explored for detection of gravitational waves to overcome the drawbacks of traditional matched filtering method. However, in several researches, the training phase of neural network is very time consuming and hardware devices with large memory are required for the task. In order to reduce the extensive amount of hardware resources and time required in training a neural network for detecting gravitational waves, we made SpecGrav. We use 2D Convolutional Neural Network and spectrograms of gravitational waves embedded in noise to detect gravitational waves from binary black hole merger and binary neutron star merger. The training phase of our neural network was of about just 19 minutes on a 2GB GPU.

【5】 Sleep syndromes onset detection based on automatic sleep staging algorithm 标题：基于自动睡眠分期算法的睡眠综合征发作检测

作者：Tim Cvetko,Tinkara Robek 备注：12 pages, 3 figures 链接：https://arxiv.org/abs/2107.03387 摘要：在本文中，我们提出了一种新的方法和实用的方法来预测睡眠综合征的早期发作，包括不宁腿综合征，失眠，基于一个由两个模块组成的算法。将快速傅立叶变换应用于30秒长的脑电记录，提供局部的时频信息，并训练深度卷积LSTM神经网络进行睡眠阶段分类。从脑电数据自动检测睡眠阶段为解决日常睡眠不规律问题提供了巨大的潜力。在此基础上，结合信号处理和统计的优点，提出了一种新的睡眠阶段分类方法。在这项研究中，我们使用了PhysioNet睡眠欧洲数据格式（EDF）数据库。代码评估结果令人印象深刻，准确率为86.43，精密度为77.76，召回率为93,32，F1评分为89.12，最终平均错误损失率为0.09。摘要：In this paper, we propose a novel method and a practical approach to predicting early onsets of sleep syndromes, including restless leg syndrome, insomnia, based on an algorithm that is comprised of two modules. A Fast Fourier Transform is applied to 30 seconds long epochs of EEG recordings to provide localized time-frequency information, and a deep convolutional LSTM neural network is trained for sleep stage classification. Automating sleep stages detection from EEG data offers great potential to tackling sleep irregularities on a daily basis. Thereby, a novel approach for sleep stage classification is proposed which combines the best of signal processing and statistics. In this study, we used the PhysioNet Sleep European Data Format (EDF) Database. The code evaluation showed impressive results, reaching an accuracy of 86.43, precision of 77.76, recall of 93,32, F1-score of 89.12 with the final mean false error loss of 0.09.

分类|识别(2篇)

【1】 Malware Classification Using Deep Boosted Learning 标题：基于深度增强学习的恶意软件分类

作者：Muhammad Asam,Saddam Hussain Khan,Tauseef Jamal,Umme Zahoora,Asifullah Khan 机构：Pattern Recognition Lab, Department of Computer & Information Sciences, PIEAS, PIEAS Artificial Intelligence Center (PAIC), PIEAS, Islamabad , Pakistan, Center for Mathematical Sciences, PIEAS, Nilore, Islamabad , Pakistan 链接：https://arxiv.org/abs/2107.04008 摘要：网络空间中的恶意活动已经不仅仅是黑客攻击机器和传播病毒。它已经成为一个国家生存的挑战，并因此演变为网络战。恶意软件是网络犯罪的重要组成部分，其分析是抵御攻击的第一道防线。提出了一种新的基于深度增强混合学习的恶意软件分类框架，命名为基于深度增强特征空间的恶意软件分类（DFS-MC）。在该框架中，通过融合性能最好的定制CNN模型的特征空间和支持向量机进行分类，提高了识别能力。通过与标准定制cnn的比较，评估了该分类框架的识别能力。定制的CNN模型通过两种方式实现：softmax分类器和基于深度混合学习的恶意软件分类。在混合学习中，从定制的CNN结构中提取深层特征，并将其输入到传统的机器学习分类器中，以提高分类性能。我们还通过微调在定制的基于CNN架构的恶意软件分类框架中引入了迁移学习的概念。利用hold-out交叉验证技术，在MalImg恶意软件数据集上验证了所提出的恶意软件分类方法的性能。实验比较采用创新的、定制的CNN，从零开始训练，并采用迁移学习对定制的CNN进行微调。提出的分类框架DFS-MC显示了改进的结果，准确率为98.61%，F值为0.96，准确率为0.96，召回率为0.96。摘要：Malicious activities in cyberspace have gone further than simply hacking machines and spreading viruses. It has become a challenge for a nations survival and hence has evolved to cyber warfare. Malware is a key component of cyber-crime, and its analysis is the first line of defence against attack. This work proposes a novel deep boosted hybrid learning-based malware classification framework and named as Deep boosted Feature Space-based Malware classification (DFS-MC). In the proposed framework, the discrimination power is enhanced by fusing the feature spaces of the best performing customized CNN architectures models and its discrimination by an SVM for classification. The discrimination capacity of the proposed classification framework is assessed by comparing it against the standard customized CNNs. The customized CNN models are implemented in two ways: softmax classifier and deep hybrid learning-based malware classification. In the hybrid learning, Deep features are extracted from customized CNN architectures and fed into the conventional machine learning classifier to improve the classification performance. We also introduced the concept of transfer learning in a customized CNN architecture based malware classification framework through fine-tuning. The performance of the proposed malware classification approaches are validated on the MalImg malware dataset using the hold-out cross-validation technique. Experimental comparisons were conducted by employing innovative, customized CNN, trained from scratch and fine-tuning the customized CNN using transfer learning. The proposed classification framework DFS-MC showed improved results, Accuracy: 98.61%, F-score: 0.96, Precision: 0.96, and Recall: 0.96.

【2】 Image Resolution Susceptibility of Face Recognition Models 标题：人脸识别模型的图像分辨率敏感性

作者：Martin Knoche,Stefan Hörmann,Gerhard Rigoll 机构：Stefan Hörman, Chair of Human-Machine Communication, Technical University, Munich, Germany 备注：19 pages, 15 figures, 2 tables 链接：https://arxiv.org/abs/2107.03769 摘要：人脸识别方法通常依赖于相等的图像分辨率来验证两幅图像上的人脸。然而，在实际应用中，由于不同的图像捕获机制或来源，这些图像分辨率通常不在同一范围内。在这项工作中，我们首先使用一个最先进的人脸识别模型来分析图像分辨率对人脸验证性能的影响。对于综合降低到$5\，\times 5\，\mathrm{px}$分辨率的图像，验证性能从$99.23\%$逐渐下降到$55\%$。特别是对于交叉分辨率图像对（一幅高分辨率图像和一幅低分辨率图像），验证精度进一步降低。我们通过观察每个2图像测试对的特征距离来更深入地研究这种行为。为了解决这一问题，我们提出了以下两种方法：1）训练一个最先进的人脸识别模型，在每批中直接使用50\%$的低分辨率图像2）训练连体网络结构，并在高分辨率和低分辨率特征之间添加余弦距离特征损失。这两种方法都显示了对交叉分辨率场景的改进，并且可以在非常低的分辨率下将精确度提高到大约$70\%$。然而，缺点是需要为每个分辨率对训练一个特定的模型。。。摘要：Face recognition approaches often rely on equal image resolution for verification faces on two images. However, in practical applications, those image resolutions are usually not in the same range due to different image capture mechanisms or sources. In this work, we first analyze the impact of image resolutions on the face verification performance with a state-of-the-art face recognition model. For images, synthetically reduced to $5\, \times 5\, \mathrm{px}$ resolution, the verification performance drops from $99.23\%$ increasingly down to almost $55\%$. Especially, for cross-resolution image pairs (one high- and one low-resolution image), the verification accuracy decreases even further. We investigate this behavior more in-depth by looking at the feature distances for every 2-image test pair. To tackle this problem, we propose the following two methods: 1) Train a state-of-the-art face-recognition model straightforward with $50\%$ low-resolution images directly within each batch. \\ 2) Train a siamese-network structure and adding a cosine distance feature loss between high- and low-resolution features. Both methods show an improvement for cross-resolution scenarios and can increase the accuracy at very low resolution to approximately $70\%$. However, a disadvantage is that a specific model needs to be trained for every resolution-pair ...

表征(2篇)

【1】 3D Neural Scene Representations for Visuomotor Control 标题：视觉运动控制中的三维神经场景表示

作者：Yunzhu Li,Shuang Li,Vincent Sitzmann,Pulkit Agrawal,Antonio Torralba 机构：MIT CSAIL 备注：First two authors contributed equally. Project Page: this https URL 链接：https://arxiv.org/abs/2107.04004 摘要：人类对我们周围的3D环境有很强的直觉理解。我们大脑中的物理模型适用于不同材料的物体，使我们能够执行各种各样的操作任务，这些任务远远超出了当前机器人的能力范围。在这项工作中，我们希望学习模型的动态三维场景纯粹从二维视觉观察。该模型将神经辐射场（NeRF）和时间对比学习与自动编码框架相结合，学习视点不变的三维感知场景表示。我们证明了一个动力学模型，建立在学习的表示空间，使视觉运动控制具有挑战性的操纵任务涉及到刚体和流体，其中目标是在一个不同于机器人操作的视点指定。当与自动解码框架相结合时，它甚至可以支持来自训练分布之外的摄像机视点的目标规范。通过对未来的预测和新颖的视图合成，进一步证明了所学习的三维动力学模型的丰富性。最后，我们提供了详细的烧蚀研究不同的系统设计和定性分析的学习表示。摘要：Humans have a strong intuitive understanding of the 3D environment around us. The mental model of the physics in our brain applies to objects of different materials and enables us to perform a wide range of manipulation tasks that are far beyond the reach of current robots. In this work, we desire to learn models for dynamic 3D scenes purely from 2D visual observations. Our model combines Neural Radiance Fields (NeRF) and time contrastive learning with an autoencoding framework, which learns viewpoint-invariant 3D-aware scene representations. We show that a dynamics model, constructed over the learned representation space, enables visuomotor control for challenging manipulation tasks involving both rigid bodies and fluids, where the target is specified in a viewpoint different from what the robot operates on. When coupled with an auto-decoding framework, it can even support goal specification from camera viewpoints that are outside the training distribution. We further demonstrate the richness of the learned 3D dynamics model by performing future prediction and novel view synthesis. Finally, we provide detailed ablation studies regarding different system designs and qualitative analysis of the learned representations.

【2】 Impossibility results for fair representations 标题：公平陈述的不可能结果

作者：Tosca Lechner,Shai Ben-David,Sushant Agarwal,Nivasini Ananthakrishnan 机构：School of Computer Science, University of Waterloo, Waterloo, Ontario, Canada 链接：https://arxiv.org/abs/2107.03483 摘要：随着人们对机器学习中公平性的日益认识和数据表示在数据处理任务中的核心作用的实现，公平数据表示的概念引起了人们的广泛兴趣。这种表示的目的是保证在表示下对数据进行训练的模型（例如，分类器）能够遵守一些公平性约束。当这些表示可以固定用于各种不同任务上的训练模型时，以及当它们充当原始数据（表示设计者已知）和潜在恶意代理（使用表示下的数据来学习预测模型和作出决策）之间的数据过滤时，这些表示是有用的。最近的一长串研究论文都致力于为实现这些目标提供工具。然而，我们证明这基本上是徒劳的努力。粗略地说，我们证明了对于不同的训练任务，没有一种表示方法能够保证分类器的公平性；即使是实现与标签无关的人口均等公平的基本目标，一旦边际数据分布发生变化，也会失败。更精确的公平性概念，如赔率相等，不能通过不考虑特定于任务的标记规则的表示来保证，这些规则将被评估为公平性（即使边际数据分布是已知的先验）。此外，除了一些琐碎的情况外，没有一种表示方法能够保证任意两个不同任务的几率相等和公平性，同时允许对这两个任务进行准确的标签预测。虽然我们的一些结论是直观的，我们制定（并证明）这种不可能的明确声明，往往是对比的印象传达了许多最近的作品公平表示。摘要：With the growing awareness to fairness in machine learning and the realization of the central role that data representation has in data processing tasks, there is an obvious interest in notions of fair data representations. The goal of such representations is that a model trained on data under the representation (e.g., a classifier) will be guaranteed to respect some fairness constraints. Such representations are useful when they can be fixed for training models on various different tasks and also when they serve as data filtering between the raw data (known to the representation designer) and potentially malicious agents that use the data under the representation to learn predictive models and make decisions. A long list of recent research papers strive to provide tools for achieving these goals. However, we prove that this is basically a futile effort. Roughly stated, we prove that no representation can guarantee the fairness of classifiers for different tasks trained using it; even the basic goal of achieving label-independent Demographic Parity fairness fails once the marginal data distribution shifts. More refined notions of fairness, like Odds Equality, cannot be guaranteed by a representation that does not take into account the task specific labeling rule with respect to which such fairness will be evaluated (even if the marginal data distribution is known a priory). Furthermore, except for trivial cases, no representation can guarantee Odds Equality fairness for any two different tasks, while allowing accurate label predictions for both. While some of our conclusions are intuitive, we formulate (and prove) crisp statements of such impossibilities, often contrasting impressions conveyed by many recent works on fair representations.

预测|估计(8篇)

【1】 A Machine Learning Approach to Safer Airplane Landings: Predicting Runway Conditions using Weather and Flight Data 标题：飞机安全着陆的机器学习方法：利用天气和飞行数据预测跑道状况

作者：Alise Danielle Midtfjord,Riccardo De Bin,Arne Bang Huseby 机构：Department of Mathematics, University of Oslo, Norway 链接：https://arxiv.org/abs/2107.04010 摘要：跑道表面的冰雪减少了减速和方向控制所需的可用轮胎-路面摩擦，并在冬季对航空业造成潜在的经济和安全威胁。为了启动适当的安全程序，飞行员需要准确及时地了解跑道表面的实际情况。本研究利用XGBoost建立了一个综合跑道评估系统，该系统包括一个用于预测滑溜状况的分类模型和一个用于预测滑溜程度的回归模型。这些模型是根据天气数据和跑道报告数据训练的。跑道表面状况由轮胎-路面摩擦系数表示，该系数是根据着陆飞机的飞行传感器数据估计的。为了评估模型的性能，将其与几种最先进的跑道评估方法进行了比较。XGBoost模型以ROC AUC为0.95识别湿滑跑道，以MAE为0.0254预测摩擦系数，并优于以往的所有方法。结果表明，当领域知识用于变量提取时，机器学习方法对复杂物理现象的建模能力较强，具有较高的精度。XGBoost模型与SHAP（SHapley加法解释）近似相结合，为机场运营商和飞行员提供了一个可理解的决策支持系统，有助于提高机场跑道的安全性和经济性。摘要：The presence of snow and ice on runway surfaces reduces the available tire-pavement friction needed for retardation and directional control and causes potential economic and safety threats for the aviation industry during the winter seasons. To activate appropriate safety procedures, pilots need accurate and timely information on the actual runway surface conditions. In this study, XGBoost is used to create a combined runway assessment system, which includes a classifcation model to predict slippery conditions and a regression model to predict the level of slipperiness. The models are trained on weather data and data from runway reports. The runway surface conditions are represented by the tire-pavement friction coefficient, which is estimated from flight sensor data from landing aircrafts. To evaluate the performance of the models, they are compared to several state-of-the-art runway assessment methods. The XGBoost models identify slippery runway conditions with a ROC AUC of 0.95, predict the friction coefficient with a MAE of 0.0254, and outperforms all the previous methods. The results show the strong abilities of machine learning methods to model complex, physical phenomena with a good accuracy when domain knowledge is used in the variable extraction. The XGBoost models are combined with SHAP (SHapley Additive exPlanations) approximations to provide a comprehensible decision support system for airport operators and pilots, which can contribute to safer and more economic operations of airport runways.

【2】 Imitation by Predicting Observations 标题：预测性观测模拟

作者：Andrew Jaegle,Yury Sulsky,Arun Ahuja,Jake Bruce,Rob Fergus,Greg Wayne 备注：ICML 2021 链接：https://arxiv.org/abs/2107.03851 摘要：模仿学习使代理能够重用和适应他人来之不易的专业知识，为学习行为中的几个关键挑战提供了解决方案。虽然在现实世界中很容易观察到行为，但潜在的行为可能无法访问。我们提出了一种新的仅从观测值进行模拟的方法，该方法在具有挑战性的连续控制任务上取得了与专家相当的性能，同时在存在与任务无关的观测值时也表现出鲁棒性。我们的方法，我们称之为形式（为“未来观察奖励模型”）是从一个反向RL目标派生出来的，并使用通过专家观察的生成性建模学习的专家行为模型进行模拟，而不需要地面真实行动。我们证明了FORM在deepmindcontrolsuitebenchmark上的性能与强基线IRL方法（GAIL）相当，而在任务无关特征的存在下，它的性能优于GAIL方法。摘要：Imitation learning enables agents to reuse and adapt the hard-won expertise of others, offering a solution to several key challenges in learning behavior. Although it is easy to observe behavior in the real-world, the underlying actions may not be accessible. We present a new method for imitation solely from observations that achieves comparable performance to experts on challenging continuous control tasks while also exhibiting robustness in the presence of observations unrelated to the task. Our method, which we call FORM (for "Future Observation Reward Model") is derived from an inverse RL objective and imitates using a model of expert behavior learned by generative modelling of the expert's observations, without needing ground truth actions. We show that FORM performs comparably to a strong baseline IRL method (GAIL) on the DeepMind Control Suite benchmark, while outperforming GAIL in the presence of task-irrelevant features.

【3】 Short-term Renewable Energy Forecasting in Greece using Prophet Decomposition and Tree-based Ensembles 标题：基于PROPHET分解和基于树的集成的希腊短期可再生能源预测

作者：Argyrios Vartholomaios,Stamatis Karlos,Eleftherios Kouloumpris,Grigorios Tsoumakas 机构： School of Informatics, Aristotle University, Thessaloniki, Greece, Medoid AI, Egnatia St., Thessaloniki, Greece 备注：11 pages, 7 figures 链接：https://arxiv.org/abs/2107.03825 摘要：利用可再生能源的能源生产由于其间歇性的性质而表现出固有的不确定性。然而，统一的欧洲能源市场促进了区域能源系统运营商对可再生能源（RES）的不断渗透。因此，RES预测可以帮助整合这些不稳定的能源，因为它可以提高电力系统的可靠性和降低辅助运行成本。本文提出了一个新的希腊太阳能和风能发电量预测数据集，并介绍了一个特征工程管道，丰富了数据集的维数空间。此外，我们提出了一种新的方法，利用创新的Prophet模型，一种端到端的预测工具，在基于树的集合提供短期预测之前，在分解能量时间序列时考虑多种非线性趋势。系统的性能通过具有代表性的评估指标来衡量，并通过估计模型在行业提供的绝对误差阈值方案下的泛化程度来衡量。所提出的混合模型与基线持久性模型、基于树的回归集合和Prophet模型相竞争，成功地超越了它们，呈现出更低的错误率和更有利的错误分布。摘要：Energy production using renewable sources exhibits inherent uncertainties due to their intermittent nature. Nevertheless, the unified European energy market promotes the increasing penetration of renewable energy sources (RES) by the regional energy system operators. Consequently, RES forecasting can assist in the integration of these volatile energy sources, since it leads to higher reliability and reduced ancillary operational costs for power systems. This paper presents a new dataset for solar and wind energy generation forecast in Greece and introduces a feature engineering pipeline that enriches the dimensional space of the dataset. In addition, we propose a novel method that utilizes the innovative Prophet model, an end-to-end forecasting tool that considers several kinds of nonlinear trends in decomposing the energy time series before a tree-based ensemble provides short-term predictions. The performance of the system is measured through representative evaluation metrics, and by estimating the model's generalization under an industryprovided scheme of absolute error thresholds. The proposed hybrid model competes with baseline persistence models, tree-based regression ensembles, and the Prophet model, managing to outperform them, presenting both lower error rates and more favorable error distribution.

【4】 Probabilistic Time Series Forecasting with Implicit Quantile Networks 标题：基于隐式分位数网络的概率时间序列预测

作者：Adèle Gouttes,Kashif Rasul,Mateusz Koren,Johannes Stephan,Tofigh Naghibi 备注：Accepted at the ICML 2021 Time Series Workshop 链接：https://arxiv.org/abs/2107.03743 摘要：本文提出了一种概率时间序列预测的通用方法。我们将自回归回归回归神经网络与隐式分位数网络相结合来建立时间动力学模型，以学习时间序列目标上的一大类分布。与其他基于真实数据和模拟数据的概率神经网络预测模型相比，该方法在预测精度和时间分布估计方面都有优势。摘要：Here, we propose a general method for probabilistic time series forecasting. We combine an autoregressive recurrent neural network to model temporal dynamics with Implicit Quantile Networks to learn a large class of distributions over a time-series target. When compared to other probabilistic neural forecasting models on real- and simulated data, our approach is favorable in terms of point-wise prediction accuracy as well as on estimating the underlying temporal distribution.

【5】 IowaRain: A Statewide Rain Event Dataset Based on Weather Radars and Quantitative Precipitation Estimation 标题：IowaRain：基于天气雷达和定量降水估计的全州降雨事件数据集

作者：Muhammed Sit,Bong-Chul Seo,Ibrahim Demir 机构： The University of Iowa 备注：4 pages, Accepted to Tackling Climate Change with Machine Learning workshop at ICML 2021 链接：https://arxiv.org/abs/2107.03432 摘要：有效的环境规划和管理，以解决气候变化可以通过广泛的环境建模与机器学习和传统的物理模型。为了开发和改进这些模型，从业者和研究人员需要综合的基准数据集，这些数据集是用他们可以依赖的环境专业知识准备和处理的。本研究提供了从美国国家气象局下一代天气雷达（NEXRAD）系统获取并由定量降水量估算系统处理的爱荷华州（2016-2019）广泛的降雨事件数据集。本研究提供的数据集可用于更好的灾害监测、响应和恢复，为预测性和规范性建模铺平了道路。摘要：Effective environmental planning and management to address climate change could be achieved through extensive environmental modeling with machine learning and conventional physical models. In order to develop and improve these models, practitioners and researchers need comprehensive benchmark datasets that are prepared and processed with environmental expertise that they can rely on. This study presents an extensive dataset of rainfall events for the state of Iowa (2016-2019) acquired from the National Weather Service Next Generation Weather Radar (NEXRAD) system and processed by a quantitative precipitation estimation system. The dataset presented in this study could be used for better disaster monitoring, response and recovery by paving the way for both predictive and prescriptive modeling.

【6】 Locally differentially private estimation of nonlinear functionals of discrete distributions 标题：离散分布非线性泛函的局部微分私有估计

作者：Cristina Butucea,Yann Issartel 链接：https://arxiv.org/abs/2107.03940 摘要：研究了局部微分隐私下离散分布非线性泛函的估计问题。初始数据$x\u 1，\ldots，x\u n\in[K]$假定为i.i.d.，并根据未知离散分布$p=（p\u 1，\ldots，p\u K）$分布。只有$\alpha$-本地差异私有（LDP）样本$z_1，…，z_n$是公开的，其中术语“local”表示每个$z_i$使用一个单独的属性$x_i$生成。我们展示的隐私机制（PM）是交互式的（即允许他们使用已经发布的机密数据）或非交互式的。我们描述了估计幂和函数$F{\gamma}=\sum{k=1}^k p{k^{\gamma}$，$\gamma>0$的二次风险的行为，作为$k，\，n$和$\alpha$的函数。在非交互情况下，我们研究了两个插件类型的估计值$F{\gamma}$，对于所有$\gamma>0$，这与Jiao等人（2017）在多项式模型中分析的MLE相似。然而，由于隐私限制，我们获得的速率较慢，与Collier等人（2020）在高斯模型中获得的速率相似。在交互式情况下，我们为所有$\gamma>1$引入了一个两步过程，当$\gamma\geq为2$时，该过程可获得更快的参数速率$（n\alpha^2）^{-1/2}$。我们给出了所有$\alpha$-LDP机制和使用私有样本的所有估计的下界结果。摘要：We study the problem of estimating non-linear functionals of discrete distributions in the context of local differential privacy. The initial data $x_1,\ldots,x_n \in [K]$ are supposed i.i.d. and distributed according to an unknown discrete distribution $p = (p_1,\ldots,p_K)$. Only $\alpha$-locally differentially private (LDP) samples $z_1,...,z_n$ are publicly available, where the term 'local' means that each $z_i$ is produced using one individual attribute $x_i$. We exhibit privacy mechanisms (PM) that are interactive (i.e. they are allowed to use already published confidential data) or non-interactive. We describe the behavior of the quadratic risk for estimating the power sum functional $F_{\gamma} = \sum_{k=1}^K p_k^{\gamma}$, $\gamma >0$ as a function of $K, \, n$ and $\alpha$. In the non-interactive case, we study two plug-in type estimators of $F_{\gamma}$, for all $\gamma >0$, that are similar to the MLE analyzed by Jiao et al. (2017) in the multinomial model. However, due to the privacy constraint the rates we attain are slower and similar to those obtained in the Gaussian model by Collier et al. (2020). In the interactive case, we introduce for all $\gamma >1$ a two-step procedure which attains the faster parametric rate $(n \alpha^2)^{-1/2}$ when $\gamma \geq 2$. We give lower bounds results over all $\alpha$-LDP mechanisms and all estimators using the private samples.

【7】 Consistency of the Maximal Information Coefficient Estimator 标题：最大信息系数估计的相合性

作者：John Lazarsfeld,Aaron Johnson 机构：Yale University, U.S. Naval Research Laboratory 链接：https://arxiv.org/abs/2107.03836 摘要：Reshef等人（Science，2011）的最大信息系数（MIC）是一种统计数据，用于测量大型数据集中变量对之间的依赖性。本文证明了MIC是相应的总体统计MIC$的一致估计。这修正了Reshef等人（JMLR，2016）的一个论点中的错误，我们对此进行了描述。摘要：The Maximal Information Coefficient (MIC) of Reshef et al. (Science, 2011) is a statistic for measuring dependence between variable pairs in large datasets. In this note, we prove that MIC is a consistent estimator of the corresponding population statistic MIC$_*$. This corrects an error in an argument of Reshef et al. (JMLR, 2016), which we describe.

【8】 Assessing putative bias in prediction of anti-microbial resistance from real-world genotyping data under explicit causal assumptions 标题：在显式因果假设下从真实世界基因分型数据评估预测抗微生物耐药性的假定偏差

作者：Mattia Prosperi,Simone Marini,Christina Boucher,Jiang Bian 机构：Department of Epidemiology, University of Florida, Gainesville, Florida, USA, Department of Computer Science and Information and, Engineering, University of Florida, Department of Health Outcomes and Biomedical, Informatics, University of Florida 备注：In DSHealth '21] Joint KDD 2021 Health Day and 2021 KDD Workshop on Applied Data Science for Healthcare, Aug 14--18, 2021, Virtual, 5 pages 链接：https://arxiv.org/abs/2107.03383 摘要：全基因组测序（WGS）由于能够获得有关引起耐药性和驱动病原体移动的基因和机制的高分辨率信息，正迅速成为鉴定抗菌药物耐药性（AMR）的常用手段。相比之下，传统的表型（抗生谱）测试不能很容易地阐明这些信息。然而，从基因型-表型数据开发AMR预测工具可能有偏差，因为抽样是非随机化的。样本来源、采集期和物种代表性会混淆遗传性状与AMR的关联。因此，预测模型在样本分布发生变化的新数据上的表现较差。在这项工作中——在一组明确的因果假设下——我们使用PATRIC系统资源整合中心（PATRIC）的基因型-表型AMR数据评估基于倾向的再平衡和混杂调整对AMR预测的有效性。我们为四环素类药物选择细菌基因型（编码为k-mer特征码，即长度为k的DNA片段）、国家、年份、物种和AMR表型，准备来自单个国家的最新基因组的测试数据。我们测试增强逻辑回归（BLR）和随机森林（RF）有/无偏差处理。在10936例中，我们发现了物种、位置和年份与AMR表型不平衡的证据。遗传特征对AMR影响的粗变化与偏误调整的变化有所不同，但只是适度变化（从4000多万k-mers中选择前20000个）。RF（0.95）的接收器工作特性（AUROC）下的面积与BLR（0.94）的面积相当，这两种情况下，自举和外部测试（n=1085）的袋外样品的AUROC都没有减少。我们观察到，与单独使用遗传特征相比，偏倚处理的AUROC增加了1%-5%。。。摘要：Whole genome sequencing (WGS) is quickly becoming the customary means for identification of antimicrobial resistance (AMR) due to its ability to obtain high resolution information about the genes and mechanisms that are causing resistance and driving pathogen mobility. By contrast, traditional phenotypic (antibiogram) testing cannot easily elucidate such information. Yet development of AMR prediction tools from genotype-phenotype data can be biased, since sampling is non-randomized. Sample provenience, period of collection, and species representation can confound the association of genetic traits with AMR. Thus, prediction models can perform poorly on new data with sampling distribution shifts. In this work -- under an explicit set of causal assumptions -- we evaluate the effectiveness of propensity-based rebalancing and confounding adjustment on AMR prediction using genotype-phenotype AMR data from the Pathosystems Resource Integration Center (PATRIC). We select bacterial genotypes (encoded as k-mer signatures, i.e. DNA fragments of length k), country, year, species, and AMR phenotypes for the tetracycline drug class, preparing test data with recent genomes coming from a single country. We test boosted logistic regression (BLR) and random forests (RF) with/without bias-handling. On 10,936 instances, we find evidence of species, location and year imbalance with respect to the AMR phenotype. The crude versus bias-adjusted change in effect of genetic signatures on AMR varies but only moderately (selecting the top 20,000 out of 40+ million k-mers). The area under the receiver operating characteristic (AUROC) of the RF (0.95) is comparable to that of BLR (0.94) on both out-of-bag samples from bootstrap and the external test (n=1,085), where AUROCs do not decrease. We observe a 1%-5% gain in AUROC with bias-handling compared to the sole use of genetic signatures. ...

其他神经网络|深度学习|模型|建模(14篇)

【1】 Bootstrapping Generalization of Process Models Discovered From Event Data 标题：从事件数据中发现的流程模型的自举概括

作者：Artem Polyvyanyy,Alistair Moffat,Luciano García-Bañuelos 机构：The University of Melbourne, Luciano Garc´ıa-Ba˜nuelos, Tecnol´ogico de Monterrey 备注：8 pages 链接：https://arxiv.org/abs/2107.03876 摘要：流程挖掘研究从IT系统的事件日志中记录的流程执行中获取价值的方法，流程发现的任务是为某个未知系统发出的事件日志推断流程模型。发现的过程模型的一个质量标准是泛化。泛化旨在量化发现的模型对系统未来执行的描述程度，这可能是流程挖掘中最难理解的质量标准。缺乏理解主要是泛化的结果，泛化试图度量系统的整个未来行为的属性，而唯一可用的行为样本是事件日志本身提供的。在本文中，我们从计算统计学中得到启发，并利用bootstrap方法来估计基于样本的种群的性质。具体地说，我们定义了一个基于事件日志的模型泛化估计量，然后使用bootstrapping来度量模型相对于系统的泛化程度及其统计显著性。实验证明了该方法在工业环境下的可行性。摘要：Process mining studies ways to derive value from process executions recorded in event logs of IT-systems, with process discovery the task of inferring a process model for an event log emitted by some unknown system. One quality criterion for discovered process models is generalization. Generalization seeks to quantify how well the discovered model describes future executions of the system, and is perhaps the least understood quality criterion in process mining. The lack of understanding is primarily a consequence of generalization seeking to measure properties over the entire future behavior of the system, when the only available sample of behavior is that provided by the event log itself. In this paper, we draw inspiration from computational statistics, and employ a bootstrap approach to estimate properties of a population based on a sample. Specifically, we define an estimator of the model's generalization based on the event log it was discovered from, and then use bootstrapping to measure the generalization of the model with respect to the system, and its statistical significance. Experiments demonstrate the feasibility of the approach in industrial settings.

【2】 SSSE: Efficiently Erasing Samples from Trained Machine Learning Models 标题：SSSE：从训练好的机器学习模型中高效地擦除样本

作者：Alexandra Peste,Dan Alistarh,Christoph H. Lampert 机构：IST Austria 链接：https://arxiv.org/abs/2107.03860 摘要：大量用户提供的数据的可用性是机器学习在许多实际任务中取得成功的关键。最近，人们越来越意识到，应该让用户更多地控制他们的数据的使用方式。特别是，用户应有权禁止将其数据用于训练机器学习系统，并有权将其从已经训练过的系统中删除。虽然已经提出了几种样本擦除方法，但它们都存在一些缺点，阻碍了它们的广泛应用。大多数方法要么只适用于非常特定的模型族，要么牺牲了太多原始模型的精度，要么对内存或计算要求过高。本文提出了一种高效的样本擦除算法SSSE，该算法适用于一类广泛的机器学习模型。通过对模型损失情况的二阶分析，我们导出了模型参数的一个封闭形式更新步骤，该步骤只需要访问要删除的数据，而不需要访问原始的训练集。在CelebFaces attributes（CelebA）、Animals with attributes 2（AwA2）和CIFAR10三个数据集上的实验表明，在某些情况下，SSSE几乎可以消除样本，这是一个最佳的、但不切实际的金标准，即只使用允许的数据从头开始训练新模型。摘要：The availability of large amounts of user-provided data has been key to the success of machine learning for many real-world tasks. Recently, an increasing awareness has emerged that users should be given more control about how their data is used. In particular, users should have the right to prohibit the use of their data for training machine learning systems, and to have it erased from already trained systems. While several sample erasure methods have been proposed, all of them have drawbacks which have prevented them from gaining widespread adoption. Most methods are either only applicable to very specific families of models, sacrifice too much of the original model's accuracy, or they have prohibitive memory or computational requirements. In this paper, we propose an efficient and effective algorithm, SSSE, for samples erasure, that is applicable to a wide class of machine learning models. From a second-order analysis of the model's loss landscape we derive a closed-form update step of the model parameters that only requires access to the data to be erased, not to the original training set. Experiments on three datasets, CelebFaces attributes (CelebA), Animals with Attributes 2 (AwA2) and CIFAR10, show that in certain cases SSSE can erase samples almost as well as the optimal, yet impractical, gold standard of training a new model from scratch with only the permitted data.

【3】 Quadruplet Deep Metric Learning Model for Imbalanced Time-series Fault Diagnosis 标题：非平衡时间序列故障诊断的四元组深度度量学习模型

作者：Xingtai Gui,Jiyang Zhang 机构：University of Electronic Science and Technology of China 链接：https://arxiv.org/abs/2107.03786 摘要：基于数据驱动和深度学习的智能诊断方法是近年来研究的热点。然而，在实际应用场景中，时序故障的不平衡性是一个亟待解决的问题。从贝叶斯概率的角度分析了如何通过调整类间距离和类内分布来提高非平衡分类的性能，提出了一种基于深度度量学习的时间序列故障诊断模型。作为深度度量学习的核心，在借鉴传统深度度量学习的基础上，提出了一种考虑不平衡类的四重数据对设计方法。基于这类数据对，本文提出了一种考虑类间距离和类内数据分布的四重损失函数，并对不平衡样本对进行了重点研究。四线阵损耗和softmax损耗函数的合理组合可以减小不平衡的影响。在两个开放数据集上进行了实验，验证了模型的有效性和鲁棒性。实验结果表明，该方法能有效地提高非平衡分类的性能。摘要：Intelligent diagnosis method based on data-driven and deep learning is an attractive and meaningful field in recent years. However, in practical application scenarios, the imbalance of time-series fault is an urgent problem to be solved. From the perspective of Bayesian probability, this paper analyzes how to improve the performance of imbalanced classification by adjusting the distance between classes and the distribution within a class and proposes a time-series fault diagnosis model based on deep metric learning. As a core of deep metric learning, a novel quadruplet data pair design considering imbalance class is proposed with reference to traditional deep metric learning. Based on such data pair, this paper proposes a quadruplet loss function which takes into account the inter-class distance and the intra-class data distribution, and pays special attention to imbalanced sample pairs. The reasonable combination of quadruplet loss and softmax loss function can reduce the impact of imbalance. Experiments on two open datasets are carried out to verify the effectiveness and robustness of the model. Experimental results show that the proposed method can effectively improve the performance of imbalanced classification.

【4】 MOD-Net: A Machine Learning Approach via Model-Operator-Data Network for Solving PDEs 标题：MOD-Net：一种基于模型-算子-数据网络求解偏微分方程的机器学习方法

作者：Lulu Zhang,Tao Luo,Yaoyu Zhang,Zhi-Qin John Xu,Zheng Ma 机构： School of Mathematical Sciences, Institute of Natural Sciences, Shanghai Jiao Tong, University, Shanghai, China., MOE-LSC and Qing Yuan Research Institute, Shanghai Jiao Tong University 链接：https://arxiv.org/abs/2107.03673 摘要：本文提出了一种求解偏微分方程的模型算子数据网络（MOD-Net）。模型驱动MOD网络求解偏微分方程，基于算子表示和数据正则化。在这项工作中，我们使用深度神经网络参数化格林函数。经验风险由控制方程的均方值、边界条件和一些标号组成，这些标号是用传统的粗网格点格式计算的，计算量小。仅使用标记数据集或仅使用模型约束，不足以精确地训练复杂问题的模型网络。直观地说，除了模型约束外，标记的数据集还起到正则化的作用。由于MOD网络利用了偏微分方程的控制方程和边界条件的信息，而不是单纯的昂贵的标号信息，因此MOD网络比原神经网络更为有效。由于MOD-Net学习偏微分方程的格林函数，所以它只解一类偏微分方程，而不解一个具体的情形。数值计算表明，MOD-Net在求解Poisson方程和一维Boltzmann方程时是非常有效的。对于非线性偏微分方程，当格林函数的概念不适用时，非线性模网络可以类似地用作求解非线性偏微分方程的ansatz。摘要：In this paper, we propose a model-operator-data network (MOD-Net) for solving PDEs. A MOD-Net is driven by a model to solve PDEs based on operator representation with regularization from data. In this work, we use a deep neural network to parameterize the Green's function. The empirical risk consists of the mean square of the governing equation, boundary conditions, and a few labels, which are numerically computed by traditional schemes on coarse grid points with cheap computation cost. With only the labeled dataset or only the model constraints, it is insufficient to accurately train a MOD-Net for complicate problems. Intuitively, the labeled dataset works as a regularization in addition to the model constraints. The MOD-Net is much efficient than original neural operator because the MOD-Net also uses the information of governing equation and the boundary conditions of the PDE rather than purely the expensive labels. Since the MOD-Net learns the Green's function of a PDE, it solves a type of PDEs but not a specific case. We numerically show MOD-Net is very efficient in solving Poisson equation and one-dimensional Boltzmann equation. For non-linear PDEs, where the concept of the Green's function does not apply, the non-linear MOD-Net can be similarly used as an ansatz for solving non-linear PDEs.

【5】 MAFIA: Machine Learning Acceleration on FPGAs for IoT Applications 标题：MAFIA：面向物联网应用的FPGA机器学习加速

作者：Nikhil Pratap Ghanathe,Vivek Seshadri,Rahul Sharma,Steve Wilton,Aayan Kumar 机构：University of British Columbia, Microsoft Research, University of California, Berkeley 备注：Accepted at The International Conference on Field-Programmable Logic and Applications (FPL), 2021 链接：https://arxiv.org/abs/2107.03653 摘要：最近在ML方面的突破产生了新的模型，允许ML推断直接在毫瓦级物联网设备上运行。一方面，现有的ML-to-FPGA编译器是为大型FPGA上的深度神经网络而设计的。另一方面，通用HLS工具无法利用ML推理特有的属性，从而导致性能不理想。我们提出MAFIA，一个工具来编译用于物联网应用的小尺寸fpga的ML推理。MAFIA提供了对线性代数运算的本地支持，可以表示各种ML算法，包括最新的模型。我们表明，MAFIA生成的程序比商业HLS编译器的最佳性能平均高出2.5倍。摘要：Recent breakthroughs in ML have produced new classes of models that allow ML inference to run directly on milliwatt-powered IoT devices. On one hand, existing ML-to-FPGA compilers are designed for deep neural-networks on large FPGAs. On the other hand, general-purpose HLS tools fail to exploit properties specific to ML inference, thereby resulting in suboptimal performance. We propose MAFIA, a tool to compile ML inference on small form-factor FPGAs for IoT applications. MAFIA provides native support for linear algebra operations and can express a variety of ML algorithms, including state-of-the-art models. We show that MAFIA-generated programs outperform best-performing variant of a commercial HLS compiler by 2.5x on average.

【6】 Sublinear Regret for Learning POMDPs 标题：学习POMDP的次线性遗憾

作者：Yi Xiong,Ningyuan Chen,Xuefeng Gao,Xiang Zhou 机构：Department of Systems Engineering and Engineering Management, The Chinese University of Hong Kong, Hong Kong 链接：https://arxiv.org/abs/2107.03635 摘要：研究了部分可观测马尔可夫决策过程（POMDPs）的基于模型的无折扣强化学习。我们所考虑的Oracle是POMDP的最优策略，在已知的环境中，在无限的地平线上的平均报酬。基于隐马尔可夫模型矩估计的谱方法、POMDPs中的置信误差控制和在线学习的置信上界方法，我们提出了一种学习算法。我们为所提出的学习算法建立了$O（T^{2/3}\sqrt{\logt}）$的遗憾界，其中$T$是学习视界。这是，据我们所知，第一个算法实现次线性遗憾就我们的甲骨文学习一般POMDPs。摘要：We study the model-based undiscounted reinforcement learning for partially observable Markov decision processes (POMDPs). The oracle we consider is the optimal policy of the POMDP with a known environment in terms of the average reward over an infinite horizon. We propose a learning algorithm for this problem, building on spectral method-of-moments estimations for hidden Markov models, the belief error control in POMDPs and upper-confidence-bound methods for online learning. We establish a regret bound of $O(T^{2/3}\sqrt{\log T})$ for the proposed learning algorithm where $T$ is the learning horizon. This is, to the best of our knowledge, the first algorithm achieving sublinear regret with respect to our oracle for learning general POMDPs.

【7】 CSDI: Conditional Score-based Diffusion Models for Probabilistic Time Series Imputation 标题：CSDI：概率时间序列补偿的条件分数扩散模型

作者：Yusuke Tashiro,Jiaming Song,Yang Song,Stefano Ermon 机构：Department of Computer Science, Stanford University, Stanford, CA, USA, Mitsubishi UFJ Trust Investment Technology Institute, Tokyo, Japan, Japan Digital Design, Tokyo, Japan 链接：https://arxiv.org/abs/2107.03502 摘要：时间序列缺失值的插补在医疗卫生和金融领域有着广泛的应用。虽然自回归模型是时间序列插补的自然候选模型，但基于分数的扩散模型最近在许多任务（如图像生成和音频合成）中的性能超过了现有的同类模型，并有望用于时间序列插补。本文提出了一种新的时间序列插补方法&条件分数扩散插补模型（CSDI）。与现有的基于分数的方法不同，条件扩散模型被明确训练用于插补，并且可以利用观测值之间的相关性。在医疗保健和环境数据方面，CSDI比现有的流行绩效指标概率插补方法提高了40-70%。此外，与现有的确定性插补方法相比，CSDI的确定性插补方法可减少5-20%的误差。此外，CSDI还可以应用于时间序列插值和概率预测，与现有的基线相比具有一定的竞争力。摘要：The imputation of missing values in time series has many applications in healthcare and finance. While autoregressive models are natural candidates for time series imputation, score-based diffusion models have recently outperformed existing counterparts including autoregressive models in many tasks such as image generation and audio synthesis, and would be promising for time series imputation. In this paper, we propose Conditional Score-based Diffusion models for Imputation (CSDI), a novel time series imputation method that utilizes score-based diffusion models conditioned on observed data. Unlike existing score-based approaches, the conditional diffusion model is explicitly trained for imputation and can exploit correlations between observed values. On healthcare and environmental data, CSDI improves by 40-70% over existing probabilistic imputation methods on popular performance metrics. In addition, deterministic imputation by CSDI reduces the error by 5-20% compared to the state-of-the-art deterministic imputation methods. Furthermore, CSDI can also be applied to time series interpolation and probabilistic forecasting, and is competitive with existing baselines.

【8】 S^3: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks标题：S^3：用于低位移位网络有效训练的符号稀疏移位再参数化

作者：Xinlin Li,Bang Liu,Yaoliang Yu,Wulong Liu,Chunjing Xu,Vahid Partovi Nia 机构：Noah’s Ark Lab, Huawei Technologies., Department of Computer Science and Operations Research (DIRO), University of Montreal., Cheriton School of Computer Science, University of Waterloo. 链接：https://arxiv.org/abs/2107.03453 摘要：移位神经网络通过去除昂贵的乘法运算和将连续权值量化为低位离散值来降低计算复杂度，与传统神经网络相比，移位神经网络具有快速和节能的特点。然而，现有的移位网络对权值初始化非常敏感，并且由于梯度消失和权值符号冻结问题，导致性能下降。为了解决这些问题，我们提出了一种新的训练低比特移位网络的方法S低比特重参数化。我们的方法以符号稀疏移位3倍的方式分解离散参数。这样，它可以有效地学习一个低比特网络，其权值动态特性类似于全精度网络，且对权值初始化不敏感。我们提出的训练方法突破了移位神经网络的界限，显示了3位移位神经网络在ImageNet上的最高精度方面优于全精度移位神经网络。摘要：Shift neural networks reduce computation complexity by removing expensive multiplication operations and quantizing continuous weights into low-bit discrete values, which are fast and energy efficient compared to conventional neural networks. However, existing shift networks are sensitive to the weight initialization, and also yield a degraded performance caused by vanishing gradient and weight sign freezing problem. To address these issues, we propose S low-bit re-parameterization, a novel technique for training low-bit shift networks. Our method decomposes a discrete parameter in a sign-sparse-shift 3-fold manner. In this way, it efficiently learns a low-bit network with a weight dynamics similar to full-precision networks and insensitive to weight initialization. Our proposed training method pushes the boundaries of shift neural networks and shows 3-bit shift networks out-performs their full-precision counterparts in terms of top-1 accuracy on ImageNet.

【9】 Deep Learning for Two-Sided Matching 标题：深度学习在双边匹配中的应用

作者：Sai Srivatsa Ravindranath,Zhe Feng,Shira Li,Jonathan Ma,Scott D. Kominers,David C. Parkes 机构：John A. Paulson School of Engineering and Applied Sciences, Harvard University, Harvard College, Harvard Business School 链接：https://arxiv.org/abs/2107.03427 摘要：我们开始使用多层神经网络来模拟双边匹配，并探索策略验证性和稳定性之间的设计空间。众所周知，这两个属性不能同时实现，但在这个设计空间的有效前沿是不理解的。我们的经验表明，通过延迟接受（仅对市场的一方稳定和策略证明）和随机序列专政（策略证明但不稳定）的凸组合，有可能在稳定性和策略证明之间达成一个很好的折衷，远远好于通过延迟接受（仅对市场的一方稳定和策略证明）和随机序列专政（策略证明但不稳定）实现的折衷。摘要：We initiate the use of a multi-layer neural network to model two-sided matching and to explore the design space between strategy-proofness and stability. It is well known that both properties cannot be achieved simultaneously but the efficient frontier in this design space is not understood. We show empirically that it is possible to achieve a good compromise between stability and strategy-proofness-substantially better than that achievable through a convex combination of deferred acceptance (stable and strategy-proof for only one side of the market) and randomized serial dictatorship (strategy-proof but not stable).

【10】 Benchpress: a scalable and platform-independent workflow for benchmarking structure learning algorithms for graphical models 标题：Benchpress：一种可扩展的独立于平台的工作流，用于对图形模型的结构学习算法进行基准测试

作者：Felix L. Rios,Giusi Moffa,Jack Kuipers 机构：Department of Mathematics and Computer Science, University of Basel, Spiegelgasse , Basel, Switzerland, Department of Biosystems Science and Engineering, ETH Z¨urich, Mattenstrasse , Basel, Switzerland, Editor: 备注：30 pages, 1 figure 链接：https://arxiv.org/abs/2107.03863 摘要：描述研究领域中变量之间的关系和建立数据生成机制的模型是许多实证科学中的一个基本问题。概率图形模型是解决这个问题的一种常用方法。学习图形结构在计算上具有挑战性，并且是当前研究的一个热点领域，目前正在开发大量的算法。为了便于不同方法的基准测试，我们提出了一种新的自动化工作流程，称为benchpress，用于为概率图形模型生成可伸缩、可复制和与平台无关的结构学习算法基准。Benchpress是通过一个简单的JSON文件连接的，这使得所有用户都可以访问它，而代码是以完全模块化的方式设计的，以使研究人员能够提供更多的方法。Benchpress目前提供了一个与BiDAG、bnlearn、GOBNILP、pcalg、r.blip、scikit learn、TETRAD和trilearn等库中大量最新算法的接口，以及用于数据生成模型和性能评估的各种方法。除了用户定义的模型和随机生成的数据集之外，软件工具还包括来自文献的一些标准数据集和图形模型，这些数据集和图形模型可能包含在基准工作流程中。我们在四个典型的数据场景中演示了这种学习贝叶斯网络的工作流程的适用性。源代码和文档可从http://github.com/felixleopoldo/benchpress. 摘要：Describing the relationship between the variables in a study domain and modelling the data generating mechanism is a fundamental problem in many empirical sciences. Probabilistic graphical models are one common approach to tackle the problem. Learning the graphical structure is computationally challenging and a fervent area of current research with a plethora of algorithms being developed. To facilitate the benchmarking of different methods, we present a novel automated workflow, called benchpress for producing scalable, reproducible, and platform-independent benchmarks of structure learning algorithms for probabilistic graphical models. Benchpress is interfaced via a simple JSON-file, which makes it accessible for all users, while the code is designed in a fully modular fashion to enable researchers to contribute additional methodologies. Benchpress currently provides an interface to a large number of state-of-the-art algorithms from libraries such as BiDAG, bnlearn, GOBNILP, pcalg, r.blip, scikit-learn, TETRAD, and trilearn as well as a variety of methods for data generating models and performance evaluation. Alongside user-defined models and randomly generated datasets, the software tool also includes a number of standard datasets and graphical models from the literature, which may be included in a benchmarking workflow. We demonstrate the applicability of this workflow for learning Bayesian networks in four typical data scenarios. The source code and documentation is publicly available from http://github.com/felixleopoldo/benchpress.

【11】 Elastic deformation of optical coherence tomography images of diabetic macular edema for deep-learning models training: how far to go? 标题：深度学习模型训练中糖尿病黄斑水肿光学相干断层扫描图像的弹性变形：还有多远？

作者：Daniel Bar-David,Laura Bar-David,Yinon Shapira,Rina Leibu,Dalia Dori,Ronit Schneor,Anath Fischer,Shiri Soudry 机构：Technion Israel Institute of Technology, Haifa, Israel, Department of Ophthalmology, Rambam Health Care Campus, Haifa, Israel, Discipline of Ophthalmology and Visual Science, University of Adelaide, Adelaide, South Australia 链接：https://arxiv.org/abs/2107.03651 摘要：探讨光学相干断层扫描（OCT）图像弹性形变在糖尿病性黄斑水肿（DME）深度学习模型建立中的应用价值。摘要：To explore the clinical validity of elastic deformation of optical coherence tomography (OCT) images for data augmentation in the development of deep-learning model for detection of diabetic macular edema (DME).

【12】 A hybrid virtual sensing approach for approximating non-linear dynamic system behavior using LSTM networks 标题：利用LSTM网络逼近非线性动态系统行为的混合虚拟传感方法

作者：Leonhard Heindel,Peter Hantschke,Markus Kästner 机构：aTechnische Universit¨at Dresden, Institute of Solid Mechanics 备注：18 pages, 10 figures 链接：https://arxiv.org/abs/2107.03645 摘要：现代物联网解决方案应用于各种不同的领域，从互联车辆、医疗保健到工业应用。它们依赖于大量相互连接的传感器，这会带来技术和经济上的挑战。虚拟传感技术的目的是通过使用来自可用测量的数据来估计额外的未知感兴趣量，从而减少系统中物理传感器的数量。成功的基于模型的解决方案包括Kalman滤波器或有限元模型与模态分析的结合，而许多数据驱动的方法依赖于机器学习算法。提出的混合虚拟传感方法将长-短记忆网络与频率响应函数模型相结合，以估计具有多个输入输出通道的非线性动态系统的行为。网络训练和预测利用短信号子序列，然后通过加窗技术重新组合。频率响应函数模型作为一个基线估计，完美地捕捉线性动态系统，并通过非线性长-短期记忆网络进行增强，采用两种不同的混合建模策略。利用一个三分量伺服液压疲劳试验台的非线性实验数据对该方法进行了验证。利用时域和频域中的各种度量以及变幅下的疲劳强度来评价该方法的逼近质量。除虚拟感知外，该算法还应用于前向预测任务。在一个单独的研究中使用合成数据来估计不同大小数据集的预测质量。摘要：Modern Internet of Things solutions are used in a variety of different areas, ranging from connected vehicles and healthcare to industrial applications. They rely on a large amount of interconnected sensors, which can lead to both technical and economical challenges. Virtual sensing techniques aim to reduce the number of physical sensors in a system by using data from available measurements to estimate additional unknown quantities of interest. Successful model-based solutions include Kalman filters or the combination of finite element models and modal analysis, while many data-driven methods rely on machine learning algorithms. The presented hybrid virtual sensing approach combines Long Short-Term Memory networks with frequency response function models in order to estimate the behavior of non-linear dynamic systems with multiple input and output channels. Network training and prediction make use of short signal subsequences, which are later recombined by applying a windowing technique. The frequency response function model acts as a baseline estimate which perfectly captures linear dynamic systems and is augmented by the non-linear Long Short-Term Memory network following two different hybrid modeling strategies. The approach is tested using a non-linear experimental dataset, which results from measurements of a three-component servo-hydraulic fatigue test bench. A variety of metrics in time and frequency domains, as well as fatigue strength under variable amplitudes are used to evaluate the approximation quality of the proposed method. In addition to virtual sensing, the algorithm is also applied to a forward prediction task. Synthetic data are used in a separate study to estimate the prediction quality on datasets of different size.

【13】 Model Selection for Generic Contextual Bandits 标题：通用上下文Bitts的模型选择

作者：Avishek Ghosh,Abishek Sankararaman,Kannan Ramchandran 机构：Department of Electrical Eng. and Computer Science, UC Berkeley, AWS AI Labs, Palo Alto, USA, Editor: 备注：40 pages, 5 figures. arXiv admin note: text overlap with arXiv:2006.02612 链接：https://arxiv.org/abs/2107.03455 摘要：在可实现性假设下，考虑一般随机背景强盗的模型选择问题。我们提出了一种基于连续求精的自适应上下文Bandit（{\ttfamily ACB}）算法，该算法分阶段工作，连续地消除了过于简单而无法适应给定实例的模型类。我们证明了该算法是自适应的，即遗憾率顺序匹配{\ttfamily FALCON}，这是Levi等人20年的最新上下文bandit算法，需要了解真实模型类。不知道正确的模型类所付出的代价只是一个累加项，导致了遗憾界中的二阶项。这种代价具有直观的特性，即随着模型类变得更容易识别，它会变得更小，反之亦然。然后，我们证明了一个更简单的explore-then-commit（ETC）风格的算法也获得了匹配{\ttfamily FALCON}的遗憾率，尽管不知道真正的模型类。然而，与{\ttfamily ACB}相比，ETC的模型选择成本更高。此外，将{\ttfamily ACB}应用于具有未知稀疏性的线性bandit设置，按顺序恢复先前由针对线性设置的算法建立的模型选择保证。摘要：We consider the problem of model selection for the general stochastic contextual bandits under the realizability assumption. We propose a successive refinement based algorithm called Adaptive Contextual Bandit ({\ttfamily ACB}), that works in phases and successively eliminates model classes that are too simple to fit the given instance. We prove that this algorithm is adaptive, i.e., the regret rate order-wise matches that of {\ttfamily FALCON}, the state-of-art contextual bandit algorithm of Levi et. al '20, that needs knowledge of the true model class. The price of not knowing the correct model class is only an additive term contributing to the second order term in the regret bound. This cost possess the intuitive property that it becomes smaller as the model class becomes easier to identify, and vice-versa. We then show that a much simpler explore-then-commit (ETC) style algorithm also obtains a regret rate of matching that of {\ttfamily FALCON}, despite not knowing the true model class. However, the cost of model selection is higher in ETC as opposed to in {\ttfamily ACB}, as expected. Furthermore, {\ttfamily ACB} applied to the linear bandit setting with unknown sparsity, order-wise recovers the model selection guarantees previously established by algorithms tailored to the linear setting.

【14】 Self-organized criticality in neural networks 标题：神经网络中的自组织临界性

作者：Mikhail I. Katsnelson,Vitaly Vanchurin,Tom Westerhout 机构： 1 Institute for Molecules and Materials, Radboud University, USA 3 Duluth Institute for Advanced Study 备注：11 pages, 4 figures 链接：https://arxiv.org/abs/2107.03402 摘要：我们从分析和数值两个方面证明了神经网络的学习动力学一般被吸引到一个自组织的临界状态。这种效应可以用不可训练变量（如神经元状态）和可训练变量（如权重矩阵）之间的四次相互作用来建模。不可训练变量被快速地驱动到随机均衡，可训练变量被缓慢地驱动到学习均衡，学习均衡由一个尺度不变的分布描述。我们的结果表明，在许多物理和生物系统中观察到的尺度不变性可能是某种学习动力学的结果，并支持宇宙可能是一个神经网络的说法。摘要：We demonstrate, both analytically and numerically, that learning dynamics of neural networks is generically attracted towards a self-organized critical state. The effect can be modeled with quartic interactions between non-trainable variables (e.g. states of neurons) and trainable variables (e.g. weight matrix). Non-trainable variables are rapidly driven towards stochastic equilibrium and trainable variables are slowly driven towards learning equilibrium described by a scale-invariant distribution on a wide range of scales. Our results suggest that the scale invariance observed in many physical and biological systems might be due to some kind of learning dynamics and support the claim that the universe might be a neural network.

其他(13篇)

【1】 Active Safety Envelopes using Light Curtains with Probabilistic Guarantees 标题：使用具有概率保证的轻型窗帘的主动安全封套

作者：Siddharth Ancha,Gaurav Pathak,Srinivasa G. Narasimhan,David Held 机构：Carnegie Mellon University, Pittsburgh PA , USA 备注：18 pages, Published at Robotics: Science and Systems (RSS) 2021 链接：https://arxiv.org/abs/2107.04000 摘要：为了安全地在未知环境中航行，机器人必须准确地感知动态障碍物。我们不使用激光雷达传感器直接测量景深，而是探索使用更便宜、分辨率更高的传感器：可编程光幕。光幕是一种可控的深度传感器，只能沿用户选择的表面进行感应。我们使用光幕来估计场景的安全范围：一个将机器人与所有障碍物隔开的假想表面。我们表明，生成感测随机位置（来自特定分布）的光幕可以快速发现未知对象场景的安全包络。重要的是，我们提供了理论上的安全保证的概率检测障碍物使用随机窗帘。我们结合随机窗帘与机器学习为基础的模型，预测和跟踪运动的安全包络有效。我们的方法在提供概率安全保证的同时，准确估计了安全包络线，可以用来验证机器人感知系统检测和避开动态障碍物的有效性。我们在模拟城市驾驶环境和真实行人环境中使用光幕装置对我们的方法进行了评估，结果表明我们可以有效地估计安全包络线。项目网站：https://siddancha.github.io/projects/active-safety-envelopes-with-guarantees 摘要：To safely navigate unknown environments, robots must accurately perceive dynamic obstacles. Instead of directly measuring the scene depth with a LiDAR sensor, we explore the use of a much cheaper and higher resolution sensor: programmable light curtains. Light curtains are controllable depth sensors that sense only along a surface that a user selects. We use light curtains to estimate the safety envelope of a scene: a hypothetical surface that separates the robot from all obstacles. We show that generating light curtains that sense random locations (from a particular distribution) can quickly discover the safety envelope for scenes with unknown objects. Importantly, we produce theoretical safety guarantees on the probability of detecting an obstacle using random curtains. We combine random curtains with a machine learning based model that forecasts and tracks the motion of the safety envelope efficiently. Our method accurately estimates safety envelopes while providing probabilistic safety guarantees that can be used to certify the efficacy of a robot perception system to detect and avoid dynamic obstacles. We evaluate our approach in a simulated urban driving environment and a real-world environment with moving pedestrians using a light curtain device and show that we can estimate safety envelopes efficiently and effectively. Project website: https://siddancha.github.io/projects/active-safety-envelopes-with-guarantees

【2】 A Long Short-Term Memory for AI Applications in Spike-based Neuromorphic Hardware 标题：一种长短时记忆在基于棘波的神经形态硬件中的人工智能应用

作者：Philipp Plank,Arjun Rao,Andreas Wild,Wolfgang Maass 机构：Institute of Theoretical Computer Science, Graz University of Technology, Inffeldgasse ,b, Graz, Austria, Intel Labs, Intel Corporation, NE ,th Ave, Hillsboro, OR , USA, Intel Labs, Intel Corporation, Lilienthalstr. , Neubiberg, Germany 备注：Philipp Plank and Arjun Rao have contributed equally to this work as first authors 链接：https://arxiv.org/abs/2107.03992 摘要：尽管付出了大量的努力，但目前采用深度神经网络（DNNs）的人工智能（AI）方法在多大程度上可以在基于spike的神经形态硬件上更有效地实现仍然是一个悬而未决的问题。这尤其适用于解决序列处理任务的人工智能方法，序列处理任务是基于峰值的神经形态硬件的主要应用目标。一个困难是，用于此类任务的dnn通常采用长-短期记忆（LSTM）单元。然而，在基于峰值的硬件中对这些单元的有效仿真却一直缺失。我们提出了一个生物启发的解决方案，解决了这个问题。该解决方案使我们能够实现一类主要的dnn，用于序列处理任务，如时间序列分类和问答，并在神经形态硬件上节省大量的能量。事实上，我们用来回答问题的用于推理对象之间关系的关系网络是大型DNN的第一个例子，该DNN在神经形态硬件上执行序列处理任务，具有显著的节能效果。摘要：In spite of intensive efforts it has remained an open problem to what extent current Artificial Intelligence (AI) methods that employ Deep Neural Networks (DNNs) can be implemented more energy-efficiently on spike-based neuromorphic hardware. This holds in particular for AI methods that solve sequence processing tasks, a primary application target for spike-based neuromorphic hardware. One difficulty is that DNNs for such tasks typically employ Long Short-Term Memory (LSTM) units. Yet an efficient emulation of these units in spike-based hardware has been missing. We present a biologically inspired solution that solves this problem. This solution enables us to implement a major class of DNNs for sequence processing tasks such as time series classification and question answering with substantial energy savings on neuromorphic hardware. In fact, the Relational Network for reasoning about relations between objects that we use for question answering is the first example of a large DNN that carries out a sequence processing task with substantial energy-saving on neuromorphic hardware.

【3】 On Margins and Derandomisation in PAC-Bayes 标题：关于PAC-Bayes中的边距和去随机化

作者：Felix Biggs,Benjamin Guedj 机构：Centre for Artificial Intelligence, London, United Kingdom, University College London and Inria 链接：https://arxiv.org/abs/2107.03955 摘要：我们开发了一个框架去随机的PAC贝叶斯泛化边界，在训练数据上实现了一个余量，将这个过程与度量集中现象联系起来。我们将这些工具应用于线性预测、具有异常erf激活函数的单隐层神经网络和深ReLU网络，获得了新的界。该方法还推广到“部分去随机化”的思想，其中只有一些层是去随机的，而其他层是随机的。这允许在更复杂的数据集上对单隐层网络进行经验评估，并有助于弥合非随机深层网络和随机深层网络（如PAC-Bayes中通常检查的）的泛化界限之间的差距。摘要：We develop a framework for derandomising PAC-Bayesian generalisation bounds achieving a margin on training data, relating this process to the concentration-of-measure phenomenon. We apply these tools to linear prediction, single-hidden-layer neural networks with an unusual erf activation function, and deep ReLU networks, obtaining new bounds. The approach is also extended to the idea of "partial-derandomisation" where only some layers are derandomised and the others are stochastic. This allows empirical evaluation of single-hidden-layer networks on more complex datasets, and helps bridge the gap between generalisation bounds for non-stochastic deep networks and those for randomised deep networks as generally examined in PAC-Bayes.

【4】 Patient Embeddings in Healthcare and Insurance Applications 标题：医疗和保险应用中的患者嵌入

作者：Pavel Blinov,Vladimir Kokh 机构：Sber Artificial Intelligence Laboratory, Moscow, Russia 链接：https://arxiv.org/abs/2107.03913 摘要：本文研究了医学领域中的概念和患者表征问题。我们将电子健康档案（EHRs）中的病历作为ICD概念的时间序列来呈现，对于这些ICD概念的嵌入是在基于Transformer的神经网络模型的无监督设置中学习的。对6年来收集的100万例患者的病史进行模型训练。通过与几种基线方法的比较，评估了这种模型的预测能力。通过对MIMIC-III数据的一系列实验，证明了该模型与同类系统相比的优越性。在此基础上，我们利用概念关系分析所得到的嵌入空间，并说明如何将医学领域的知识以病人嵌入的形式成功地转移到实际的保险评分任务中。摘要：The paper researches the problem of concept and patient representations in the medical domain. We present the patient histories from Electronic Health Records (EHRs) as temporal sequences of ICD concepts for which embeddings are learned in an unsupervised setup with a transformer-based neural network model. The model training was performed on the collection of one million patients' histories in 6 years. The predictive power of such a model is assessed in comparison with several baseline methods. A series of experiments on the MIMIC-III data show the advantage of the presented model compared to a similar system. Further, we analyze the obtained embedding space with regards to concept relations and show how knowledge from the medical domain can be successfully transferred to the practical task of insurance scoring in the form of patient embeddings.

【5】 The Price of Diversity 标题：多样性的代价

作者：Hari Bandi,Dimitris Bertsimas 机构：Sloan School of Management and Operations Research Center, Massachusetts Institute of Technology, Cambridge, MA 链接：https://arxiv.org/abs/2107.03900 摘要：在涉及个人选择的数据集中，性别、种族和种族方面的系统性偏见通常是无意识的。因此，社会已经发现，在这样的环境下，以保持精英管理的方式来减轻偏见和实现多样性是一个挑战。我们提出（a）一种新的优化方法，基于同时优化翻转结果标签和训练分类模型，以发现选择过程中的变化，从而在不显著影响精英管理的情况下实现多样性，以及（b）一种新的实现工具，采用最佳分类树来提供关于个体的哪些属性导致其标签翻转的见解，并帮助以人类决策者可以理解的方式在当前选择过程中做出改变。我们提供了三个真实世界数据集的案例研究，包括假释、律师资格和贷款决定，并证明多样性的价格很低，有时是负面的，也就是说，我们可以修改我们的选择过程，以增强多样性，而不显着影响精英管理，有时改善它。摘要：Systemic bias with respect to gender, race and ethnicity, often unconscious, is prevalent in datasets involving choices among individuals. Consequently, society has found it challenging to alleviate bias and achieve diversity in a way that maintains meritocracy in such settings. We propose (a) a novel optimization approach based on optimally flipping outcome labels and training classification models simultaneously to discover changes to be made in the selection process so as to achieve diversity without significantly affecting meritocracy, and (b) a novel implementation tool employing optimal classification trees to provide insights on which attributes of individuals lead to flipping of their labels, and to help make changes in the current selection processes in a manner understandable by human decision makers. We present case studies on three real-world datasets consisting of parole, admissions to the bar and lending decisions, and demonstrate that the price of diversity is low and sometimes negative, that is we can modify our selection processes in a way that enhances diversity without affecting meritocracy significantly, and sometimes improving it.

【6】 Augmented Data as an Auxiliary Plug-in Towards Categorization of Crowdsourced Heritage Data 标题：增强数据作为众包遗产数据分类的辅助插件

作者：Shashidhar Veerappa Kudari,Akshaykumar Gunari,Adarsh Jamadandi,Ramesh Ashok Tabib,Uma Mudenagudi 机构：KLE Technological University, Hubballi 链接：https://arxiv.org/abs/2107.03852 摘要：在本文中，我们提出了一种策略，通过引入数据扩充作为辅助插件来缓解集群性能低下的问题。经典的聚类技术，如K-means、高斯混合模型和谱聚类是许多数据驱动应用的核心。然而，最近使用神经网络的无监督同时特征学习和聚类（也称为深嵌入聚类（DEC））得到了广泛的关注。深度特征聚类的开创性工作集中在定义相关的聚类损失函数和选择合适的神经网络进行特征提取。在所有这些情况下，一个中心问题是数据稀疏，伴随着高类内和低类间方差，从而导致较差的聚类性能和错误的候选分配。为此，我们采用数据增强技术来提高集群的密度，从而提高整体性能。我们训练卷积自动编码器（CAE）的一个变种，用扩充的数据来构造初始特征空间，作为一种新的深度聚类模型。我们在众包印度遗产数据集上展示了所提出策略的结果。大量的实验表明，与现有的工作相比，改进是一致的。摘要：In this paper, we propose a strategy to mitigate the problem of inefficient clustering performance by introducing data augmentation as an auxiliary plug-in. Classical clustering techniques such as K-means, Gaussian mixture model and spectral clustering are central to many data-driven applications. However, recently unsupervised simultaneous feature learning and clustering using neural networks also known as Deep Embedded Clustering (DEC) has gained prominence. Pioneering works on deep feature clustering focus on defining relevant clustering loss function and choosing the right neural network for extracting features. A central problem in all these cases is data sparsity accompanied by high intra-class and low inter-class variance, which subsequently leads to poor clustering performance and erroneous candidate assignments. Towards this, we employ data augmentation techniques to improve the density of the clusters, thus improving the overall performance. We train a variant of Convolutional Autoencoder (CAE) with augmented data to construct the initial feature space as a novel model for deep clustering. We demonstrate the results of proposed strategy on crowdsourced Indian Heritage dataset. Extensive experiments show consistent improvements over existing works.

【7】 Collaboration of Experts: Achieving 80% Top-1 Accuracy on ImageNet with 100M FLOPs 标题：专家协作：100M Flops在ImageNet上实现80%TOP-1准确率

作者：Yikang Zhang,Zhuo Chen,Zhao Zhong 机构：Huawei 链接：https://arxiv.org/abs/2107.03815 摘要：在本文中，我们提出了一个专家协作（CoE）框架，以汇集多个网络的专业知识，实现一个共同的目标。每个专家都是一个独立的网络，在数据集的独特部分拥有专门知识，这增强了集体能力。给定一个样本，委托人选择一个专家，同时输出一个粗略的预测来支持提前终止。为了实现这个框架，我们提出了三个模块来促使每个模型发挥作用，即权重生成模块（WGM）、标签生成模块（LGM）和方差计算模块（VCM）。我们的方法在ImageNet上达到了最先进的性能，在194M的跳频下达到了80.7%的top-1精度。CoE结合PWLU激活函数和CondConv，首次实现了100米的跳频精度，达到80.0%。更重要的是，我们的方法是硬件友好的，实现了3-6倍的加速比相比，一些现有的条件计算方法。摘要：In this paper, we propose a Collaboration of Experts (CoE) framework to pool together the expertise of multiple networks towards a common aim. Each expert is an individual network with expertise on a unique portion of the dataset, which enhances the collective capacity. Given a sample, an expert is selected by the delegator, which simultaneously outputs a rough prediction to support early termination. To fulfill this framework, we propose three modules to impel each model to play its role, namely weight generation module (WGM), label generation module (LGM) and variance calculation module (VCM). Our method achieves the state-of-the-art performance on ImageNet, 80.7% top-1 accuracy with 194M FLOPs. Combined with PWLU activation function and CondConv, CoE further achieves the accuracy of 80.0% with only 100M FLOPs for the first time. More importantly, our method is hardware friendly and achieves a 3-6x speedup compared with some existing conditional computation approaches.

【8】 Bag of Tricks for Neural Architecture Search 标题：用于神经结构搜索的一袋小把戏

作者：Thomas Elsken,Benedikt Staffler,Arber Zela,Jan Hendrik Metzen,Frank Hutter 机构：Bosch Center for Artificial Intelligence,University of Freiburg 链接：https://arxiv.org/abs/2107.03719 摘要：虽然神经结构搜索方法在前几年取得了成功，并在各种问题上取得了新的最新进展，但它们也因不稳定、对其超参数高度敏感以及通常表现不优于随机搜索而受到批评。为了阐明这个问题，我们讨论了一些有助于提高稳定性、效率和整体性能的实际考虑因素。摘要：While neural architecture search methods have been successful in previous years and led to new state-of-the-art performance on various problems, they have also been criticized for being unstable, being highly sensitive with respect to their hyperparameters, and often not performing better than random search. To shed some light on this issue, we discuss some practical considerations that help improve the stability, efficiency and overall performance.

【9】 Digitizing Handwriting with a Sensor Pen: A Writer-Independent Recognizer 标题：用传感笔数字化笔迹：一种独立于写字者的识别器

作者：Mohamad Wehbi,Tim Hamann,Jens Barth,Bjoern Eskofier 机构：Machine Learning and Data Analytics Lab, Friedrich-Alexander-Universit¨at Erlangen-N¨urnberg, Erlangen, Germany, STABILO International GmbH, Heroldsberg, Germany 备注：Published in 2020 17th International Conference on Frontiers in Handwriting Recognition (ICFHR) 链接：https://arxiv.org/abs/2107.03704 摘要：在线笔迹识别的研究由来已久，但在普通纸上书写时，实际应用效果却很少。以前使用基于传感器的设备的方法遇到的问题限制了所开发系统在实际应用中的使用。本文介绍了一个独立于书写者的系统，该系统使用带有传感器的笔来识别写在普通纸上的字符。该系统适用于现实世界中的应用，不需要特定于用户的识别训练。该笔提供线性加速度、角速度、磁场和用户施加的力，并充当一个数字化器，将传感器的模拟信号转换为时间序列数据，同时在普通纸张上书写。我们用这支笔收集的数据集由拉丁文小写字母和大写字母组成。文中给出了一个卷积神经网络字母分类模型的实验结果，证明了该方法的实用性，并取得了良好的识别效果。本论文旨在提供一个可用于普通纸张书写的实时手写识别系统。摘要：Online handwriting recognition has been studied for a long time with only few practicable results when writing on normal paper. Previous approaches using sensor-based devices encountered problems that limited the usage of the developed systems in real-world applications. This paper presents a writer-independent system that recognizes characters written on plain paper with the use of a sensor-equipped pen. This system is applicable in real-world applications and requires no user-specific training for recognition. The pen provides linear acceleration, angular velocity, magnetic field, and force applied by the user, and acts as a digitizer that transforms the analogue signals of the sensors into timeseries data while writing on regular paper. The dataset we collected with this pen consists of Latin lower-case and upper-case alphabets. We present the results of a convolutional neural network model for letter classification and show that this approach is practical and achieves promising results for writer-independent character recognition. This work aims at providing a realtime handwriting recognition system to be used for writing on normal paper.

【10】 Differentiable Random Access Memory using Lattices 标题：利用格子的可微随机存取存储器

作者：Adam P. Goucher,Rajan Troll 机构：Department of Pure Mathematics and Mathematical Statistics, University of Cambridge, InBalance 备注：11 pages, 3 figures, submitted to NeurIPS 2021 链接：https://arxiv.org/abs/2107.03474 摘要：我们介绍了一个可微随机存取存储器模块，它的性能与大小无关，可以扩展到数十亿个条目。该设计在选定格点上存储条目，利用对称性有效地计算任意点的最近邻。在此基础上，将标准的神经网络结构扩展为一个单一的存储层，我们可以将参数计数扩展到内存限制，而计算开销可以忽略不计，以相似的代价获得更好的精度。在大型语言建模任务中，这些具有较大容量的增强模型显著优于未修改的transformer基线。我们发现随着内存大小的不断扩展，达到了测试的极限。摘要：We introduce a differentiable random access memory module with $O(1)$ performance regardless of size, scaling to billions of entries. The design stores entries on points of a chosen lattice to calculate nearest neighbours of arbitrary points efficiently by exploiting symmetries. Augmenting a standard neural network architecture with a single memory layer based on this, we can scale the parameter count up to memory limits with negligible computational overhead, giving better accuracy at similar cost. On large language modelling tasks, these enhanced models with larger capacity significantly outperform the unmodified transformer baseline. We found continued scaling with memory size up to the limits tested.

【11】 CHASE: Robust Visual Tracking via Cell-Level Differentiable Neural Architecture Search 标题：Chase：基于细胞级可区分神经结构搜索的鲁棒视觉跟踪

作者：Seyed Mojtaba Marvasti-Zadeh,Javad Khaghani,Li Cheng,Hossein Ghanei-Yakhdan,Shohreh Kasaei 机构： Vision and Learning Lab, University of Alberta, Edmonton, Canada, Digital Image & Video Processing Lab, Yazd University, Yazd, Iran, Image Processing Lab, Sharif University of Technology, Tehran, Iran 备注：The first two authors contributed equally to this work 链接：https://arxiv.org/abs/2107.03463 摘要：如今，一个强大的视觉目标跟踪器依赖于其精心设计的模块，这些模块通常由手动设计的网络体系结构组成，以提供高质量的跟踪结果。不足为奇，手工设计过程成为一个特别具有挑战性的障碍，因为它需要足够的经验，巨大的努力，直觉，也许还有一些好运。同时，神经网络结构搜索在图像分割等实际应用中有着广泛的应用前景，是解决可行网络结构自动搜索问题的一种很有前途的方法。在这项工作中，我们提出了一种新的单元级可微结构搜索机制来自动化跟踪模块的网络设计，目的是在离线训练期间使主干特征适应跟踪网络的目标。该方法简单、高效，无需堆叠一系列模组来建构网路。我们的方法很容易被合并到现有的跟踪器中，并使用不同的基于可微结构搜索的方法和跟踪目标进行了实证验证。广泛的实验评估表明，我们的方法优于五个常用的基准测试。同时，我们在TrackingNet数据集上的二阶（一阶）DARTS方法的自动搜索过程需要41（18）小时。摘要：A strong visual object tracker nowadays relies on its well-crafted modules, which typically consist of manually-designed network architectures to deliver high-quality tracking results. Not surprisingly, the manual design process becomes a particularly challenging barrier, as it demands sufficient prior experience, enormous effort, intuition and perhaps some good luck. Meanwhile, neural architecture search has gaining grounds in practical applications such as image segmentation, as a promising method in tackling the issue of automated search of feasible network structures. In this work, we propose a novel cell-level differentiable architecture search mechanism to automate the network design of the tracking module, aiming to adapt backbone features to the objective of a tracking network during offline training. The proposed approach is simple, efficient, and with no need to stack a series of modules to construct a network. Our approach is easy to be incorporated into existing trackers, which is empirically validated using different differentiable architecture search-based methods and tracking objectives. Extensive experimental evaluations demonstrate the superior performance of our approach over five commonly-used benchmarks. Meanwhile, our automated searching process takes 41 (18) hours for the second (first) order DARTS method on the TrackingNet dataset.

【12】 Measuring Financial Time Series Similarity With a View to Identifying Profitable Stock Market Opportunities 标题：衡量金融时间序列相似性以识别有利可图的股票市场机会

作者：Rian Dolphin,Barry Smyth,Yang Xu,Ruihai Dong 机构： School of Computer Science, University College Dublin, Dublin, Ireland, Insight Centre for Data Analytics, University College Dublin, Dublin, Ireland, School of Economics and Management, Beihang University, Beijing, China 备注：15 pages. Accepted for presentation at the International Conference on Case-Based Reasoning 2021 (ICCBR) 链接：https://arxiv.org/abs/2107.03926 摘要：由于市场的高度随机性以及影响交易量和价格的大量因素和事件，预测股票收益率是一个具有挑战性的问题。然而，它已经被证明是机器学习研究的一个有吸引力的目标，因为即使是适度的预测精度也有可能带来显著的好处。在本文中，我们描述了一个基于案例的推理方法来预测股市回报率仅使用历史定价数据。我们认为，基于案例的股票预测的障碍之一是，在确定类似的定价历史作为未来预测的基础时，缺乏合适的相似性度量——传统的欧几里德和基于相关性的方法由于各种原因而无效——在这方面，这项工作的一个关键贡献是开发了一种新的用于比较历史定价数据的相似性度量。通过与各种常规基准的比较，我们展示了这种度量和基于实例的方法在实际应用中的优势。摘要：Forecasting stock returns is a challenging problem due to the highly stochastic nature of the market and the vast array of factors and events that can influence trading volume and prices. Nevertheless it has proven to be an attractive target for machine learning research because of the potential for even modest levels of prediction accuracy to deliver significant benefits. In this paper, we describe a case-based reasoning approach to predicting stock market returns using only historical pricing data. We argue that one of the impediments for case-based stock prediction has been the lack of a suitable similarity metric when it comes to identifying similar pricing histories as the basis for a future prediction -- traditional Euclidean and correlation based approaches are not effective for a variety of reasons -- and in this regard, a key contribution of this work is the development of a novel similarity metric for comparing historical pricing data. We demonstrate the benefits of this metric and the case-based approach in a real-world application in comparison to a variety of conventional benchmarks.

【13】 Modality Completion via Gaussian Process Prior Variational Autoencoders for Multi-Modal Glioma Segmentation 标题：高斯过程先于变分自动编码器的模态补全多模态胶质瘤分割

作者：Mohammad Hamghalam,Alejandro F. Frangi,Baiying Lei,Amber L. Simpson 机构：L. Simpson, School of Computing, Queen’s University, Kingston, ON, Canada, Department of Electrical, Biomedical, and Mechatronics Engineering, Qazvin, Branch, Azad University, Qazvin,-, Iran 备注：Accepted in MICCAI 2021 链接：https://arxiv.org/abs/2107.03442 摘要：在涉及多协议磁共振成像（MRI）的大型研究中，由于质量差（如成像伪影）、采集失败或走廊中断成像检查，可能会出现遗漏给定患者的一个或多个子模式的情况。在某些情况下，由于扫描时间有限或回顾性协调两项独立研究的成像方案，某些方案不可用。缺失的图像模式对分割框架提出了挑战，因为缺失扫描提供的补充信息随后丢失。在本文中，我们提出了一个新的模式，多模态高斯过程先验变分自动编码器（MGP-VAE），以填补一个或多个缺失子模式的病人扫描。MGP-VAE可以利用变分自动编码器（VAE）上的高斯过程（GP）来利用受试者/患者和子模式的相关性。与为现有子模式的每个可能子集设计一个网络或使用框架来混合特征图不同，可以基于所有可用样本从单个模型生成缺失数据。我们展示了MGP-VAE在脑肿瘤分割中的适用性，其中四个子模式中的任何一个、两个或三个可能缺失。我们在BraTS'19数据集上对具有缺失子模态的竞争性分割基线进行的实验表明，MGP-VAE模型对于分割任务是有效的。摘要：In large studies involving multi protocol Magnetic Resonance Imaging (MRI), it can occur to miss one or more sub-modalities for a given patient owing to poor quality (e.g. imaging artifacts), failed acquisitions, or hallway interrupted imaging examinations. In some cases, certain protocols are unavailable due to limited scan time or to retrospectively harmonise the imaging protocols of two independent studies. Missing image modalities pose a challenge to segmentation frameworks as complementary information contributed by the missing scans is then lost. In this paper, we propose a novel model, Multi-modal Gaussian Process Prior Variational Autoencoder (MGP-VAE), to impute one or more missing sub-modalities for a patient scan. MGP-VAE can leverage the Gaussian Process (GP) prior on the Variational Autoencoder (VAE) to utilize the subjects/patients and sub-modalities correlations. Instead of designing one network for each possible subset of present sub-modalities or using frameworks to mix feature maps, missing data can be generated from a single model based on all the available samples. We show the applicability of MGP-VAE on brain tumor segmentation where either, two, or three of four sub-modalities may be missing. Our experiments against competitive segmentation baselines with missing sub-modality on BraTS'19 dataset indicate the effectiveness of the MGP-VAE model for segmentation tasks.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-09，如有侵权请联系 cloudcommunity@tencent.com 删除

linux