人工智能学术速递[7.27]

公众号-arXiv每日学术速递

发布于 2021-07-28 14:56:47

1.8K0

发布于 2021-07-28 14:56:47

文章被收录于专栏：arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.AI人工智能，共计69篇

【1】 Contextual Transformer Networks for Visual Recognition 标题：用于视觉识别的上下文变换网络

作者：Yehao Li,Ting Yao,Yingwei Pan,Tao Mei 机构：JD AI Research, Beijing, China 备注：Rank 1 in open-set image classification task of Open World Vision Challenge @ CVPR 2021; The source code and models are publicly available at: \url{this https URL} 链接：https://arxiv.org/abs/2107.12292 摘要：具有自我关注的Transformer引发了自然语言处理领域的一场革命，并在众多的计算机视觉任务中激发了Transformer风格的体系结构设计。然而，现有的设计大多直接在二维特征图上利用自注意来获得基于每个空间位置上的孤立查询和密钥对的注意矩阵，而没有充分利用相邻密钥之间丰富的上下文。在这项工作中，我们设计了一个新的转换器样式模块，即上下文转换器（CoT）块，用于视觉识别。这种设计充分利用了输入键间的上下文信息来指导动态注意矩阵的学习，从而增强了视觉表征能力。从技术上讲，CoT block first通过$3\times3$卷积对输入键进行上下文编码，从而产生输入的静态上下文表示。我们通过两个连续的$1\times1$卷积，将编码的密钥与输入查询连接起来，学习动态多头部注意矩阵。学习的注意力矩阵乘以输入值，实现输入的动态上下文表示。最后将静态和动态的上下文表示融合作为输出。我们的CoT块很吸引人，因为它可以很容易地替换ResNet架构中的每个$3\times3$卷积，产生一个称为上下文Transformer网络（CoTNet）的Transformer式主干。通过对广泛应用（如图像识别、目标检测和实例分割）的大量实验，验证了CoTNet作为更强大主干网的优越性。源代码位于\url{https://github.com/JDAI-CV/CoTNet}. 摘要：Transformer with self-attention has led to the revolutionizing of natural language processing field, and recently inspires the emergence of Transformer-style architecture design with competitive results in numerous computer vision tasks. Nevertheless, most of existing designs directly employ self-attention over a 2D feature map to obtain the attention matrix based on pairs of isolated queries and keys at each spatial location, but leave the rich contexts among neighbor keys under-exploited. In this work, we design a novel Transformer-style module, i.e., Contextual Transformer (CoT) block, for visual recognition. Such design fully capitalizes on the contextual information among input keys to guide the learning of dynamic attention matrix and thus strengthens the capacity of visual representation. Technically, CoT block first contextually encodes input keys via a $3\times3$ convolution, leading to a static contextual representation of inputs. We further concatenate the encoded keys with input queries to learn the dynamic multi-head attention matrix through two consecutive $1\times1$ convolutions. The learnt attention matrix is multiplied by input values to achieve the dynamic contextual representation of inputs. The fusion of the static and dynamic contextual representations are finally taken as outputs. Our CoT block is appealing in the view that it can readily replace each $3\times3$ convolution in ResNet architectures, yielding a Transformer-style backbone named as Contextual Transformer Networks (CoTNet). Through extensive experiments over a wide range of applications (e.g., image recognition, object detection and instance segmentation), we validate the superiority of CoTNet as a stronger backbone. Source code is available at \url{https://github.com/JDAI-CV/CoTNet}.

【2】 Meta-Learning Adversarial Domain Adaptation Network for Few-Shot Text Classification 标题：用于少射文本分类的元学习对抗性领域自适应网络

作者：ChengCheng Han,Zeqiu Fan,Dongxiang Zhang,Minghui Qiu,Ming Gao,Aoying Zhou 机构：School of Data Science and Engineering, East China Normal University, College of Computer Science and Technology, Zhejiang University, Alibaba Group 链接：https://arxiv.org/abs/2107.12262 摘要：元学习已经成为一种处理少量镜头文本分类的趋势性技术，并取得了最先进的性能。然而，现有的解决方案在很大程度上依赖于对训练数据的词汇特征及其分布特征的利用，而忽视了对新任务的适应能力的增强。本文提出了一种新的元学习框架，结合对抗域自适应网络，旨在提高模型的自适应能力，为新类生成高质量的文本嵌入。在四个基准数据集上进行了大量的实验，结果表明，在所有的数据集上，我们的方法都明显优于现有的模型。特别是在20个新闻组的数据集上，单镜头分类和五镜头分类的准确率分别从52.1%提高到59.6%和68.3%提高到77.8%。摘要：Meta-learning has emerged as a trending technique to tackle few-shot text classification and achieved state-of-the-art performance. However, existing solutions heavily rely on the exploitation of lexical features and their distributional signatures on training data, while neglecting to strengthen the model's ability to adapt to new tasks. In this paper, we propose a novel meta-learning framework integrated with an adversarial domain adaptation network, aiming to improve the adaptive ability of the model and generate high-quality text embedding for new classes. Extensive experiments are conducted on four benchmark datasets and our method demonstrates clear superiority over the state-of-the-art models in all the datasets. In particular, the accuracy of 1-shot and 5-shot classification on the dataset of 20 Newsgroups is boosted from 52.1% to 59.6%, and from 68.3% to 77.8%, respectively.

【3】 DYPLODOC: Dynamic Plots for Document Classification 标题：DYPLODOC：用于文档分类的动态绘图

作者：Anastasia Malysheva,Alexey Tikhonov,Ivan P. Yamshchikov 机构：Open Data Science, Moscow, Russia, Berlin, Germany, LEYA Lab, Yandex, Higher School of, Economics, St.Petersburg, Russia 链接：https://arxiv.org/abs/2107.12226 摘要：叙事的生成和分析仍然处于现代自然语言处理的边缘，但在许多应用中却至关重要。提出了一种地块动态特征提取方法。我们提出了一个数据集，其中包括一万三千个电视节目的情节描述，以及从中提取的关于其类型和动态情节的元信息。我们验证了提出的情节动态提取工具，并讨论了该方法在叙事分析和生成任务中的可能应用。摘要：Narrative generation and analysis are still on the fringe of modern natural language processing yet are crucial in a variety of applications. This paper proposes a feature extraction method for plot dynamics. We present a dataset that consists of the plot descriptions for thirteen thousand TV shows alongside meta-information on their genres and dynamic plots extracted from them. We validate the proposed tool for plot dynamics extraction and discuss possible applications of this method to the tasks of narrative analysis and generation.

【4】 Thought Flow Nets: From Single Predictions to Trains of Model Thought 标题：思维流网络：从单一预测到模型思维序列

作者：Hendrik Schuff,Heike Adel,Ngoc Thang Vu 机构： Bosch Center for Artificial Intelligence, Renningen, Germany, Institut für Maschinelle Sprachverarbeitung, University of Stuttgart 链接：https://arxiv.org/abs/2107.12220 摘要：当人类解决复杂的问题时，很少能马上做出决定。相反，他们从一个直观的决定开始，反思它，发现错误，解决矛盾，在不同的假设之间跳跃。因此，他们创造了一系列的想法，并遵循一系列的思路，最终得出结论性的决定。与此相反，今天的神经分类模型大多是训练一个单一的输入映射到一个固定的输出。在本文中，我们将探讨如何给予模型第二次、第三次和第k$次思考的机会。我们从黑格尔的辩证法中得到启发，提出了一种将现有分类器的类预测（如图像类forest）转化为一系列预测（如forest$\rightarrow$tree$\rightarrow$蘑菇）的方法。具体地说，我们提出了一个校正模块，用来估计模型的正确性，以及一个基于预测梯度的迭代预测更新。我们的方法在类概率分布上产生一个动态系统$\unicode{x2014}$思想流。我们从计算机视觉和自然语言处理的不同数据集和任务来评估我们的方法。我们观察到令人惊讶的复杂但直观的行为，并证明我们的方法（i）可以纠正错误分类，（ii）增强模型性能，（iii）对高水平的敌对攻击具有鲁棒性，（iv）在标签分布偏移设置中可将精确度提高高达4%，（iv）提供了一种模型解释性工具，该工具可揭示在单个分布预测中不可见的模型知识。摘要：When humans solve complex problems, they rarely come up with a decision right-away. Instead, they start with an intuitive decision, reflect upon it, spot mistakes, resolve contradictions and jump between different hypotheses. Thus, they create a sequence of ideas and follow a train of thought that ultimately reaches a conclusive decision. Contrary to this, today's neural classification models are mostly trained to map an input to one single and fixed output. In this paper, we investigate how we can give models the opportunity of a second, third and $k$-th thought. We take inspiration from Hegel's dialectics and propose a method that turns an existing classifier's class prediction (such as the image class forest) into a sequence of predictions (such as forest $\rightarrow$ tree $\rightarrow$ mushroom). Concretely, we propose a correction module that is trained to estimate the model's correctness as well as an iterative prediction update based on the prediction's gradient. Our approach results in a dynamic system over class probability distributions $\unicode{x2014}$ the thought flow. We evaluate our method on diverse datasets and tasks from computer vision and natural language processing. We observe surprisingly complex but intuitive behavior and demonstrate that our method (i) can correct misclassifications, (ii) strengthens model performance, (iii) is robust to high levels of adversarial attacks, (iv) can increase accuracy up to 4% in a label-distribution-shift setting and (iv) provides a tool for model interpretability that uncovers model knowledge which otherwise remains invisible in a single distribution prediction.

【5】 Hindsight Value Function for Variance Reduction in Stochastic Dynamic Environment 标题：随机动态环境中减方差的后见值函数

作者：Jiaming Guo,Rui Zhang,Xishan Zhang,Shaohui Peng,Qi Yi,Zidong Du,Xing Hu,Qi Guo,Yunji Chen 机构：SKL of Computer Architecture, Institute of Computing Technology, CAS, Beijing, China, Cambricon Technologies, University of Chinese Academy of Sciences, China, University of Science and Technology of China 链接：https://arxiv.org/abs/2107.12216 摘要：策略梯度方法在深度强化学习中有很好的应用前景，但梯度估计方差较大。为了减小方差，通常采用状态值函数。然而，在随机动态环境中，状态值函数的作用变得有限，意外的状态动态和报酬会增加方差。在本文中，我们提出用一种新的后见值函数代替状态值函数，它利用来自未来的信息来减少随机动态环境中梯度估计的方差。特别地，为了得到一个理想的无偏梯度估计，我们提出了一种信息论方法，它优化了未来的嵌入，使之独立于以前的行为。在我们的实验中，我们将所提出的后见值函数应用于随机动态环境，包括离散动作环境和连续动作环境。与标准状态值函数相比，本文提出的后知后觉值函数能一致地减小方差，稳定训练，改善最终策略。摘要：Policy gradient methods are appealing in deep reinforcement learning but suffer from high variance of gradient estimate. To reduce the variance, the state value function is applied commonly. However, the effect of the state value function becomes limited in stochastic dynamic environments, where the unexpected state dynamics and rewards will increase the variance. In this paper, we propose to replace the state value function with a novel hindsight value function, which leverages the information from the future to reduce the variance of the gradient estimate for stochastic dynamic environments. Particularly, to obtain an ideally unbiased gradient estimate, we propose an information-theoretic approach, which optimizes the embeddings of the future to be independent of previous actions. In our experiments, we apply the proposed hindsight value function in stochastic dynamic environments, including discrete-action environments and continuous-action environments. Compared with the standard state value function, the proposed hindsight value function consistently reduces the variance, stabilizes the training, and improves the eventual policy.

【6】 On The Impact of Client Sampling on Federated Learning Convergence 标题：客户抽样对联合学习收敛性的影响研究

作者：Yann Fraboni,Richard Vidal,Laetitia Kameni,Marco Lorenzi 机构：Université Côte d’Azur, Inria Sophia Antipolis, Epione Research Group, France, Accenture Labs, Sophia Antipolis, France 链接：https://arxiv.org/abs/2107.12211 摘要：虽然客户抽样是当前最先进的联邦学习（FL）方法的核心操作，但这一过程对FL的收敛性和速度的影响至今仍有待研究。在这项工作中，我们介绍了一个新的分解定理收敛的流动性，允许明确量化的影响，客户抽样对全球模型更新。与以前的收敛分析相反，我们的定理提供了给定收敛步骤的精确分解，从而能够准确地考虑客户端采样和异构性的作用。首先，我们提供了一个理论基础，为先前报道的结果之间的关系FL收敛和方差的聚合权重。其次，我们第一次证明了聚合权值之间的协方差也会影响FL收敛的质量。第三，我们确定聚合权重之和是另一个减慢的来源，并且应该等于1以提高FL收敛速度。我们的理论是一般性的，并在这里应用于多项式分布（MD）和均匀抽样，这两种是FL中默认的客户端抽样，并通过在非iid和不平衡场景下的一系列实验进行了验证。我们的研究结果表明，MD抽样可以作为默认的抽样方案，因为MD抽样对学习过程中数据比率的变化具有弹性，而均匀抽样仅在客户具有相同数据量的特殊情况下才具有优势。摘要：While clients' sampling is a central operation of current state-of-the-art federated learning (FL) approaches, the impact of this procedure on the convergence and speed of FL remains to date under-investigated. In this work we introduce a novel decomposition theorem for the convergence of FL, allowing to clearly quantify the impact of client sampling on the global model update. Contrarily to previous convergence analyses, our theorem provides the exact decomposition of a given convergence step, thus enabling accurate considerations about the role of client sampling and heterogeneity. First, we provide a theoretical ground for previously reported results on the relationship between FL convergence and the variance of the aggregation weights. Second, we prove for the first time that the quality of FL convergence is also impacted by the resulting covariance between aggregation weights. Third, we establish that the sum of the aggregation weights is another source of slow-down and should be equal to 1 to improve FL convergence speed. Our theory is general, and is here applied to Multinomial Distribution (MD) and Uniform sampling, the two default client sampling in FL, and demonstrated through a series of experiments in non-iid and unbalanced scenarios. Our results suggest that MD sampling should be used as default sampling scheme, due to the resilience to the changes in data ratio during the learning process, while Uniform sampling is superior only in the special case when clients have the same amount of data.

【7】 An Efficient Insect Pest Classification Using Multiple Convolutional Neural Network Based Models 标题：一种基于多卷积神经网络模型的有效害虫分类方法

作者：Hieu T. Ung,Huy Q. Ung,Binh T. Nguyen 机构：Nguyen., Received: date Accepted: date 备注：22 pages, 15 figures 链接：https://arxiv.org/abs/2107.12189 摘要：准确识别病虫害对保护作物或对病虫害进行早期处理，减少农业经济损失具有重要意义。由于人工识别速度慢、耗时长、成本高，因此设计一个害虫自动识别系统是非常必要的。传统的基于图像的害虫分类方法由于其复杂性，效率不高。由于害虫种类、规模、形态多样，田间背景复杂，昆虫种间的外貌相似性高，因此害虫分类是一项艰巨的任务。随着深度学习技术的迅速发展，基于CNN的方法是开发快速、准确的害虫分类器的最佳途径。在这项工作中，我们提出了不同的基于卷积神经网络的模型，包括注意、特征金字塔和细粒度模型。我们在两个公共数据集上评估了我们的方法：大规模虫害数据集、IP102基准数据集和一个较小的数据集，即D0的宏观平均精度（MPre）、宏观平均召回率（MRec）、宏观平均F1评分（MF1）、精度（Acc）和几何平均值（GM）。实验结果表明，在这两种数据集上，结合这些基于卷积神经网络的模型比现有的方法具有更好的性能。例如，我们在IP102和D0上获得的最高精度分别是$74.13\%$和$99.78\%$，绕过了相应的最新精度：$67.1\%$（IP102）和$98.8\%$（D0）。我们还发表了我们的代码，为当前有关昆虫害虫分类问题的研究做出贡献。摘要：Accurate insect pest recognition is significant to protect the crop or take the early treatment on the infected yield, and it helps reduce the loss for the agriculture economy. Design an automatic pest recognition system is necessary because manual recognition is slow, time-consuming, and expensive. The Image-based pest classifier using the traditional computer vision method is not efficient due to the complexity. Insect pest classification is a difficult task because of various kinds, scales, shapes, complex backgrounds in the field, and high appearance similarity among insect species. With the rapid development of deep learning technology, the CNN-based method is the best way to develop a fast and accurate insect pest classifier. We present different convolutional neural network-based models in this work, including attention, feature pyramid, and fine-grained models. We evaluate our methods on two public datasets: the large-scale insect pest dataset, the IP102 benchmark dataset, and a smaller dataset, namely D0 in terms of the macro-average precision (MPre), the macro-average recall (MRec), the macro-average F1- score (MF1), the accuracy (Acc), and the geometric mean (GM). The experimental results show that combining these convolutional neural network-based models can better perform than the state-of-the-art methods on these two datasets. For instance, the highest accuracy we obtained on IP102 and D0 is $74.13\%$ and $99.78\%$, respectively, bypassing the corresponding state-of-the-art accuracy: $67.1\%$ (IP102) and $98.8\%$ (D0). We also publish our codes for contributing to the current research related to the insect pest classification problem.

【8】 EGGS: Eigen-Gap Guided Search\\ Making Subspace Clustering Easy 标题：鸡蛋：Eigen-Gap引导式搜索\简化子空间聚类

作者：Jicong Fan,Yiheng Tu,Zhao Zhang,Mingbo Zhao 机构： Zhao Zhang is with the School of Computer Science andInformation Engineering, Hefei University of Technology, MingboZhao is with the School of Information Science, Donghua University 链接：https://arxiv.org/abs/2107.12183 摘要：谱聚类的性能在很大程度上依赖于亲和矩阵的质量。已有多种亲和力矩阵的构造方法，但这些方法都需要预先确定超参数，这就需要很强的经验，在实际应用中会遇到困难，特别是当簇间相似度较高或/或数据集较大时。另一方面，我们经常要决定使用线性模型还是非线性模型，这仍然取决于经验。为了解决这两个问题，本文提出了一种基于特征间隙的子空间聚类搜索方法。其主要思想是在由线性回归和核回归构造的候选集之间寻找最可靠的亲和矩阵，其中可靠度由本文定义的拉普拉斯图的相对特征差来量化。我们从理论上和数值上证明了拉普拉斯矩阵具有较大的相对本征间隙，可以获得较高的聚类精度和稳定性。该方法能够在预先定义的空间中自动搜索最优模型和超参数。搜索空间非常容易确定并且可以任意大，但是相对紧凑的搜索空间可以减少非常不必要的计算。该方法在实际应用中具有很高的灵活性和方便性，并且由于亲和矩阵不是通过迭代优化来计算的，因此计算量较小。我们将该方法扩展到大规模数据集，如MNIST，其时间开销小于90秒，聚类精度是最先进的。大量的自然图像聚类实验表明，该方法比基线方法更稳定、准确、高效。摘要：The performance of spectral clustering heavily relies on the quality of affinity matrix. A variety of affinity-matrix-construction methods have been proposed but they have hyper-parameters to determine beforehand, which requires strong experience and lead to difficulty in real applications especially when the inter-cluster similarity is high or/and the dataset is large. On the other hand, we often have to determine to use a linear model or a nonlinear model, which still depends on experience. To solve these two problems, in this paper, we present an eigen-gap guided search method for subspace clustering. The main idea is to find the most reliable affinity matrix among a set of candidates constructed by linear and kernel regressions, where the reliability is quantified by the \textit{relative-eigen-gap} of graph Laplacian defined in this paper. We show, theoretically and numerically, that the Laplacian matrix with a larger relative-eigen-gap often yields a higher clustering accuracy and stability. Our method is able to automatically search the best model and hyper-parameters in a pre-defined space. The search space is very easy to determine and can be arbitrarily large, though a relatively compact search space can reduce the highly unnecessary computation. Our method has high flexibility and convenience in real applications, and also has low computational cost because the affinity matrix is not computed by iterative optimization. We extend the method to large-scale datasets such as MNIST, on which the time cost is less than 90s and the clustering accuracy is state-of-the-art. Extensive experiments of natural image clustering show that our method is more stable, accurate, and efficient than baseline methods.

【9】 Novel Span Measure, Spanning Sets and Applications 标题：新的跨度测度、支撑集及其应用

作者：Nidhika Yadav 链接：https://arxiv.org/abs/2107.12178 摘要：基于粗糙集的生成集是近年来提出的一种处理自然语言处理问题中不确定性的方法。提出了一种新的基于上近似的跨距测量方法。本文的主要贡献是提出了另一种跨度和跨度集的不确定性测度。首先，本文提出了一种新的计算跨度定义，用上近似代替边界区域。这在计算上近似比计算边界区域方便得多的情况下很有用。其次，讨论了新跨度的性质及其与早期跨度测度的关系。第三，本文介绍了该方法的应用领域。摘要：Rough Set based Spanning Sets were recently proposed to deal with uncertainties arising in the problem in domain of natural language processing problems. This paper presents a novel span measure using upper approximations. The key contribution of this paper is to propose another uncertainty measure of span and spanning sets. Firstly, this paper proposes a new definition of computing span which use upper approximation instead of boundary regions. This is useful in situations where computing upper approximations are much more convenient that computing boundary region. Secondly, properties of novel span and relation with earlier span measure are discussed. Thirdly, the paper presents application areas where the proposed span measure can be utilized.

【10】 Perceptually Validated Precise Local Editing for Facial Action Units with StyleGAN 标题：基于StyleGAN的面部动作单元感知验证精确局部编辑

作者：Alara Zindancıoğlu,T. Metin Sezgin 机构：Koc¸ University, Istanbul, Turkey 链接：https://arxiv.org/abs/2107.12143 摘要：编辑面部表情的能力在计算机图形学中有着广泛的应用。理想的面部表情编辑算法需要满足两个重要条件。首先，它应该允许精确和有针对性地编辑个人面部动作。其次，它应该生成高保真输出而不产生伪影。我们构建了一个基于StyleGAN的解决方案，它已被广泛应用于人脸的语义处理。在这样做的同时，我们进一步了解了StyleGAN中各种语义属性的编码方式。特别地，我们证明了在潜在空间中执行编辑的天真策略会导致某些动作单元之间的不希望的耦合，即使它们在概念上是不同的。例如，虽然眉毛较低和嘴唇较紧是不同的行动单位，他们似乎相关的训练数据。因此，StyleGAN很难解开它们。我们通过计算每个动作单元的独立影响区域，允许对这些动作单元进行分离编辑，并限制对这些区域的编辑。通过对23名受试者的感知实验，验证了本文提出的局部编辑方法的有效性。结果表明，我们的方法提供了更高的控制本地编辑和产生的图像具有优越的保真度相比，国家的最先进的方法。摘要：The ability to edit facial expressions has a wide range of applications in computer graphics. The ideal facial expression editing algorithm needs to satisfy two important criteria. First, it should allow precise and targeted editing of individual facial actions. Second, it should generate high fidelity outputs without artifacts. We build a solution based on StyleGAN, which has been used extensively for semantic manipulation of faces. As we do so, we add to our understanding of how various semantic attributes are encoded in StyleGAN. In particular, we show that a naive strategy to perform editing in the latent space results in undesired coupling between certain action units, even if they are conceptually distinct. For example, although brow lowerer and lip tightener are distinct action units, they appear correlated in the training data. Hence, StyleGAN has difficulty in disentangling them. We allow disentangled editing of such action units by computing detached regions of influence for each action unit, and restrict editing to these regions. We validate the effectiveness of our local editing method through perception experiments conducted with 23 subjects. The results show that our method provides higher control over local editing and produces images with superior fidelity compared to the state-of-the-art methods.

【11】 Fine-Grained Emotion Prediction by Modeling Emotion Definitions 标题：基于情感定义建模的细粒度情感预测

作者：Gargi Singh,Dhanajit Brahma,Piyush Rai,Ashutosh Modi 机构：CSE Department, Indian Institute of Technology Kanpur (IIT-K), Kanpur , India 备注：8 Pages, accepted at ACII 2021 for Orals 链接：https://arxiv.org/abs/2107.12135 摘要：本文通过情感定义模型，提出了一种新的文本细粒度情感预测框架。我们的方法包括一个多任务学习框架，该框架将情绪定义建模为一个辅助任务，同时对情绪预测的主要任务进行训练。我们使用掩蔽语言建模和类定义预测任务来建模定义。我们的模型在细粒度情感运动方面优于现有的最新技术。我们进一步证明，该训练模型可用于其他基准数据集上的迁移学习，用于不同情绪标签集、域和大小的情绪预测。在迁移学习实验中，该模型的泛化能力优于基线模型。摘要：In this paper, we propose a new framework for fine-grained emotion prediction in the text through emotion definition modeling. Our approach involves a multi-task learning framework that models definitions of emotions as an auxiliary task while being trained on the primary task of emotion prediction. We model definitions using masked language modeling and class definition prediction tasks. Our models outperform existing state-of-the-art for fine-grained emotion dataset GoEmotions. We further show that this trained model can be used for transfer learning on other benchmark datasets in emotion prediction with varying emotion label sets, domains, and sizes. The proposed models outperform the baselines on transfer learning experiments demonstrating the generalization capability of the models.

【12】 Structural Learning of Probabilistic Sentential Decision Diagrams under Partial Closed-World Assumption 标题：部分封闭世界假设下概率意义决策图的结构学习

作者：Alessandro Antonucci,Alessandro Facchini,Lilith Mattei 备注：None 链接：https://arxiv.org/abs/2107.12130 摘要：概率句子决策图是一类结构可分解的概率电路，特别是为了嵌入逻辑约束而设计的。为了适应经典的learnsnpn方案来学习这些模型的结构，我们提出了一种基于部分封闭世界假设的新方案：数据隐式地提供电路的逻辑基础。因此，在初始数据库中通过递归地对批进行聚类来学习Sum节点，而变量的划分遵循给定的输入vtree。初步实验表明，该方法能很好地拟合训练数据，并能很好地推广到测试数据，前提是这些数据与底层逻辑基础保持一致，即训练数据的松弛。摘要：Probabilistic sentential decision diagrams are a class of structured-decomposable probabilistic circuits especially designed to embed logical constraints. To adapt the classical LearnSPN scheme to learn the structure of these models, we propose a new scheme based on a partial closed-world assumption: data implicitly provide the logical base of the circuit. Sum nodes are thus learned by recursively clustering batches in the initial data base, while the partitioning of the variables obeys a given input vtree. Preliminary experiments show that the proposed approach might properly fit training data, and generalize well to test data, provided that these remain consistent with the underlying logical base, that is a relaxation of the training data base.

【13】 Learning to Adversarially Blur Visual Object Tracking 标题：学习逆模糊视觉目标跟踪

作者：Qing Guo,Ziyi Cheng,Felix Juefei-Xu,Lei Ma,Xiaofei Xie,Yang Liu,Jianjun Zhao 机构： Nanyang Technological University, Singapore, Kyushu University, Japan, Alibaba Group, USA, University of Alberta, Canada 备注：This work has been accepted to ICCV2021. 12 pages, 5 figures 链接：https://arxiv.org/abs/2107.12085 摘要：在曝光过程中，由于物体或摄像机的运动而引起的运动模糊是视觉目标跟踪的一个关键问题，严重影响跟踪精度。在这项工作中，我们从一个新的角度，即对抗性模糊攻击（ABA）来探讨视觉目标跟踪器对运动模糊的鲁棒性。我们的主要目标是在线传输输入帧到它们的自然运动模糊对位，同时在跟踪过程中误导最先进的跟踪器。为此，本文首先根据运动模糊的产生原理，结合运动信息和光的积累过程，设计了一种用于视觉跟踪的运动模糊合成方法。利用这种综合方法，我们提出了一种基于优化的ABA（OP-ABA）方法，通过迭代优化对抗性目标函数来跟踪运动和光积累参数。OP-ABA能够生成自然的对抗性示例，但迭代会导致大量的时间开销，因此不适合攻击实时跟踪器。为了缓解这一问题，我们进一步提出了一步ABA（OS-ABA），在OP-ABA的指导下设计并训练了一个联合对抗运动和积累预测网络（JAMANet），该网络能够一步有效地估计对抗运动和积累参数。在四个流行数据集（如OTB100、VOT2018、UAV123和LaSOT）上的实验表明，我们的方法能够在四个具有高可转移性的最先进跟踪器上造成显著的精度下降。请在找到源代码https://github.com/tsingqguo/ABA 摘要：Motion blur caused by the moving of the object or camera during the exposure can be a key challenge for visual object tracking, affecting tracking accuracy significantly. In this work, we explore the robustness of visual object trackers against motion blur from a new angle, i.e., adversarial blur attack (ABA). Our main objective is to online transfer input frames to their natural motion-blurred counterparts while misleading the state-of-the-art trackers during the tracking process. To this end, we first design the motion blur synthesizing method for visual tracking based on the generation principle of motion blur, considering the motion information and the light accumulation process. With this synthetic method, we propose \textit{optimization-based ABA (OP-ABA)} by iteratively optimizing an adversarial objective function against the tracking w.r.t. the motion and light accumulation parameters. The OP-ABA is able to produce natural adversarial examples but the iteration can cause heavy time cost, making it unsuitable for attacking real-time trackers. To alleviate this issue, we further propose \textit{one-step ABA (OS-ABA)} where we design and train a joint adversarial motion and accumulation predictive network (JAMANet) with the guidance of OP-ABA, which is able to efficiently estimate the adversarial motion and accumulation parameters in a one-step way. The experiments on four popular datasets (\eg, OTB100, VOT2018, UAV123, and LaSOT) demonstrate that our methods are able to cause significant accuracy drops on four state-of-the-art trackers with high transferability. Please find the source code at https://github.com/tsingqguo/ABA

【14】 An Argumentative Dialogue System for COVID-19 Vaccine Information 标题：一个冠状病毒疫苗信息讨论式对话系统

作者：Bettina Fazzinga,Andrea Galassi,Paolo Torroni 机构： ICAR CNR, Rende, Italy, DISI, University of Bologna, Bologna, Italy 备注：20 pages, 2 figures, currently under submission 链接：https://arxiv.org/abs/2107.12079 摘要：对话系统在人工智能中被广泛应用，以支持与用户的及时互动交流。我们提出一个通用的对话系统架构，利用计算论证和最先进的语言技术。我们用一个COVID-19疫苗信息案例来说明和评估这个系统。摘要：Dialogue systems are widely used in AI to support timely and interactive communication with users. We propose a general-purpose dialogue system architecture that leverages computational argumentation and state-of-the-art language technologies. We illustrate and evaluate the system using a COVID-19 vaccine information case study.

【15】 How Knowledge Graph and Attention Help? A Quantitative Analysis into Bag-level Relation Extraction 标题：知识图谱和注意力有何帮助？袋级关系抽取的定量分析

作者：Zikun Hu,Yixin Cao,Lifu Huang,Tat-Seng Chua 机构：National University of Singapore, S-Lab, Nanyang Technological University, Computer Science Department, Virginia Tech 链接：https://arxiv.org/abs/2107.12064 摘要：知识图和注意机制在弱监督方法中有效地引入和选择有用信息。然而，只有定性分析和消融研究作为证据。在本文中，我们提供了一个数据集，并提出了一个范式来定量评估注意和KG对袋水平关系提取（RE）的影响。我们发现：（1）较高的注意准确率可能会导致较差的性能，因为它可能会损害模型提取实体提及特征的能力(2）注意的表现在很大程度上受各种噪声分布模式的影响，这些噪声分布模式与真实数据集密切相关(3） KG强化注意确实提高了再绩效，虽然不是通过强化注意，而是通过整合实体优先权；注意机制可能会加剧训练数据不足的问题。基于这些发现，我们表明，与三种最先进的基线相比，在两个真实数据集上，RE模型的直接变体可以实现显著的改进（平均6%的AUC）。我们的代码和数据集可在https://github.com/zig-kwin-hu/how-KG-ATT-help. 摘要：Knowledge Graph (KG) and attention mechanism have been demonstrated effective in introducing and selecting useful information for weakly supervised methods. However, only qualitative analysis and ablation study are provided as evidence. In this paper, we contribute a dataset and propose a paradigm to quantitatively evaluate the effect of attention and KG on bag-level relation extraction (RE). We find that (1) higher attention accuracy may lead to worse performance as it may harm the model's ability to extract entity mention features; (2) the performance of attention is largely influenced by various noise distribution patterns, which is closely related to real-world datasets; (3) KG-enhanced attention indeed improves RE performance, while not through enhanced attention but by incorporating entity prior; and (4) attention mechanism may exacerbate the issue of insufficient training data. Based on these findings, we show that a straightforward variant of RE model can achieve significant improvements (6% AUC on average) on two real-world datasets as compared with three state-of-the-art baselines. Our codes and datasets are available at https://github.com/zig-kwin-hu/how-KG-ATT-help.

【16】 Predicting Game Engagement and Difficulty Using AI Players 标题：使用人工智能玩家预测游戏参与度和难度

作者：Shaghayegh Roohi,Christian Guckelsberger,Asko Relas,Henri Heiskanen,Jari Takatalo,Perttu Hämäläinen 机构： Aalto University 备注：18 pages, 5 figures, 2 tables. In Proceedings ACM Human-Computer Interaction, Vol. 5, CHIPLAY, Article 231. Publication date: September 2021 链接：https://arxiv.org/abs/2107.12061 摘要：本文提出了一种新的自动游戏测试方法，用于预测人类玩家的行为和体验。先前已有研究表明，深度强化学习（DRL）游戏玩家代理可以预测游戏难度和玩家参与度，可操作为平均通过率和流失率。我们通过montecarlo树搜索（MCTS）增强DRL来改进这种方法。我们还提出了一种增强的预测器特征选择策略，基于这样的观察：人工智能代理的最佳案例性能比代理的平均性能与人类数据的相关性更强。两种方法都能持续提高预测精度，DRL增强的MCTS在最困难的水平上优于DRL和vanilla-MCTS。我们的结论是，通过自动播放测试的球员建模可以受益于结合DRL和MCTS。此外，如果AI游戏性平均不能产生好的预测，那么研究重复的最佳AI代理运行的子集是值得的。摘要：This paper presents a novel approach to automated playtesting for the prediction of human player behavior and experience. It has previously been demonstrated that Deep Reinforcement Learning (DRL) game-playing agents can predict both game difficulty and player engagement, operationalized as average pass and churn rates. We improve this approach by enhancing DRL with Monte Carlo Tree Search (MCTS). We also motivate an enhanced selection strategy for predictor features, based on the observation that an AI agent's best-case performance can yield stronger correlations with human data than the agent's average performance. Both additions consistently improve the prediction accuracy, and the DRL-enhanced MCTS outperforms both DRL and vanilla MCTS in the hardest levels. We conclude that player modelling via automated playtesting can benefit from combining DRL and MCTS. Moreover, it can be worthwhile to investigate a subset of repeated best AI agent runs, if AI gameplay does not yield good predictions on average.

【17】 SVEva Fair: A Framework for Evaluating Fairness in Speaker Verification 标题：Sveva Fair：说话人确认中的公平性评估框架

作者：Wiebke Toussaint,Aaron Yi Ding 机构： Delft University of Technology 链接：https://arxiv.org/abs/2107.12049 摘要：尽管深度神经网络（DNNs）在支持设备语音助理方面取得了成功，但机器学习中的偏见和歧视现象越来越多，这就增加了研究这些系统公平性的紧迫性。说话人验证是一种生物特征识别的形式，它允许语音助理进行访问。由于缺乏适合于测试说话人验证组件公平性的公平性度量和评估框架，关于模型性能如何在子组之间变化以及什么因素影响性能变化的知之甚少。为了应对这一新的挑战，我们设计并开发了SVEva-Fair，这是一个可访问的、可操作的和模型无关的框架，用于评估说话人验证组件的公平性。该框架提供了评估措施和可视化，以询问说话人分组的模型性能，并比较模型之间的公平性。我们在一个基于VoxCeleb数据集的端到端DNNs的案例研究中展示了SVEva公平性，以揭示现有基于说话人人口统计属性的嵌入式语音识别系统中的潜在偏差。我们的评估结果表明，公众可获得的基准模型是不公平的，并且对一些民族和大多数民族的女性发言人的预测结果总是较差。为了为公平可靠的嵌入式说话人验证铺平道路，SVEva fair已经实现为一个开源的python库，可以集成到嵌入式ML开发管道中，以方便开发人员和研究人员排除不可靠的说话人验证性能，以及选择高影响力的方法来缓解公平挑战摘要：Despite the success of deep neural networks (DNNs) in enabling on-device voice assistants, increasing evidence of bias and discrimination in machine learning is raising the urgency of investigating the fairness of these systems. Speaker verification is a form of biometric identification that gives access to voice assistants. Due to a lack of fairness metrics and evaluation frameworks that are appropriate for testing the fairness of speaker verification components, little is known about how model performance varies across subgroups, and what factors influence performance variation. To tackle this emerging challenge, we design and develop SVEva Fair, an accessible, actionable and model-agnostic framework for evaluating the fairness of speaker verification components. The framework provides evaluation measures and visualisations to interrogate model performance across speaker subgroups and compare fairness between models. We demonstrate SVEva Fair in a case study with end-to-end DNNs trained on the VoxCeleb datasets to reveal potential bias in existing embedded speech recognition systems based on the demographic attributes of speakers. Our evaluation shows that publicly accessible benchmark models are not fair and consistently produce worse predictions for some nationalities, and for female speakers of most nationalities. To pave the way for fair and reliable embedded speaker verification, SVEva Fair has been implemented as an open-source python library and can be integrated into the embedded ML development pipeline to facilitate developers and researchers in troubleshooting unreliable speaker verification performance, and selecting high impact approaches for mitigating fairness challenges

【18】 3D AGSE-VNet: An Automatic Brain Tumor MRI Data Segmentation Framework 标题：3DAGSE-vNet：一种脑肿瘤MRI数据自动分割框架

作者：Xi Guan,Guang Yang,Jianming Ye,Weiji Yang,Xiaomei Xu,Weiwei Jiang,Xiaobo Lai 机构： School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Cardiovascular Research Centre, Royal Brompton Hospital, London, SW,NP, UK, National Heart and Lung Institute, Imperial College London, London, SW,AZ, UK 备注：34 pages, 12 figure, Accepted by BMC Medical Imaging 链接：https://arxiv.org/abs/2107.12046 摘要：背景：脑胶质瘤是最常见的脑恶性肿瘤，发病率高，死亡率高达3%以上，严重危害人类健康。临床上获取脑肿瘤的主要方法是MRI。从多模态MRI扫描图像中分割脑肿瘤区域，有助于治疗检查、诊断后监测和疗效评价。然而，目前临床上常用的脑肿瘤分割操作仍然是手工分割，导致其耗时长，不同算子之间的性能差异较大，迫切需要一种一致、准确的自动分割方法。方法：针对上述问题，提出了一种脑肿瘤MRI数据自动分割框架AGSE-VNet。在我们的研究中，在每个编码器中加入压缩和激励（SE）模块，在每个解码器中加入注意引导滤波器（AG）模块，利用信道关系自动增强信道中的有用信息，抑制无用信息，利用注意机制引导边缘信息，去除噪声等无关信息的影响。结果：我们使用BraTS2020挑战在线验证工具来评估我们的方法。验证的重点是整个肿瘤（WT）、肿瘤核心（TC）和增强肿瘤（ET）的Dice评分分别为0.68、0.85和0.70。结论：尽管MRI图像强度不同，但AGSE-VNet不受肿瘤大小的影响，能更准确地提取三个区域的特征，取得了令人印象深刻的效果，为脑肿瘤患者的临床诊断和治疗做出了突出贡献。摘要：Background: Glioma is the most common brain malignant tumor, with a high morbidity rate and a mortality rate of more than three percent, which seriously endangers human health. The main method of acquiring brain tumors in the clinic is MRI. Segmentation of brain tumor regions from multi-modal MRI scan images is helpful for treatment inspection, post-diagnosis monitoring, and effect evaluation of patients. However, the common operation in clinical brain tumor segmentation is still manual segmentation, lead to its time-consuming and large performance difference between different operators, a consistent and accurate automatic segmentation method is urgently needed. Methods: To meet the above challenges, we propose an automatic brain tumor MRI data segmentation framework which is called AGSE-VNet. In our study, the Squeeze and Excite (SE) module is added to each encoder, the Attention Guide Filter (AG) module is added to each decoder, using the channel relationship to automatically enhance the useful information in the channel to suppress the useless information, and use the attention mechanism to guide the edge information and remove the influence of irrelevant information such as noise. Results: We used the BraTS2020 challenge online verification tool to evaluate our approach. The focus of verification is that the Dice scores of the whole tumor (WT), tumor core (TC) and enhanced tumor (ET) are 0.68, 0.85 and 0.70, respectively. Conclusion: Although MRI images have different intensities, AGSE-VNet is not affected by the size of the tumor, and can more accurately extract the features of the three regions, it has achieved impressive results and made outstanding contributions to the clinical diagnosis and treatment of brain tumor patients.

【19】 ContextNet: A Click-Through Rate Prediction Framework Using Contextual information to Refine Feature Embedding 标题：ContextNet：一种利用上下文信息优化特征嵌入的点击率预测框架

作者：Zhiqiang Wang,Qingyun She,PengTao Zhang,Junlin Zhang 机构：Sina Weibo Corp, Beijing, China 备注：arXiv admin note: text overlap with arXiv:2102.07619 链接：https://arxiv.org/abs/2107.12025 摘要：点击率（CTR）估计是个性化广告和推荐系统中的一项基本任务，对于排名模型有效捕获复杂的高阶特征非常重要，根据单词出现的上下文句子信息动态细化单词嵌入，我们认为在CTR估计任务中，根据输入实例中包含的上下文信息，逐层动态细化每个特征的嵌入也很重要。我们可以通过这种方式有效地捕获每个特性的有用特性交互。在本文中，我们提出了一个新的CTR框架ContextNet，它通过根据输入上下文动态细化每个特征的嵌入来隐式地建模高阶特征交互。具体来说，ContextNet由两个关键组件组成：上下文嵌入模块和ContextNet块。上下文嵌入模块从输入实例中收集每个特征的上下文信息，ContextNet块通过将上下文高阶交互信息合并到特征嵌入中，逐层维护每个特征的嵌入，动态细化特征的表示。为了使框架具体化，我们还通过在ContextNet块中引入线性上下文嵌入网络和两个非线性映射子网络，提出了该框架下的两个模型（ContextNet-PFFN和ContextNet-SFFN）。我们在四个真实数据集上进行了大量的实验，实验结果表明，我们提出的ContextNet PFFN和ContextNet SFFN模型的性能明显优于DeepFM和xDeepFM等最新模型。摘要：Click-through rate (CTR) estimation is a fundamental task in personalized advertising and recommender systems and it's important for ranking models to effectively capture complex high-order features.Inspired by the success of ELMO and Bert in NLP field, which dynamically refine word embedding according to the context sentence information where the word appears, we think it's also important to dynamically refine each feature's embedding layer by layer according to the context information contained in input instance in CTR estimation tasks. We can effectively capture the useful feature interactions for each feature in this way. In this paper, We propose a novel CTR Framework named ContextNet that implicitly models high-order feature interactions by dynamically refining each feature's embedding according to the input context. Specifically, ContextNet consists of two key components: contextual embedding module and ContextNet block. Contextual embedding module aggregates contextual information for each feature from input instance and ContextNet block maintains each feature's embedding layer by layer and dynamically refines its representation by merging contextual high-order interaction information into feature embedding. To make the framework specific, we also propose two models(ContextNet-PFFN and ContextNet-SFFN) under this framework by introducing linear contextual embedding network and two non-linear mapping sub-network in ContextNet block. We conduct extensive experiments on four real-world datasets and the experiment results demonstrate that our proposed ContextNet-PFFN and ContextNet-SFFN model outperform state-of-the-art models such as DeepFM and xDeepFM significantly.

【20】 Leaf-FM: A Learnable Feature Generation Factorization Machine for Click-Through Rate Prediction 标题：Leaf-FM：一种用于点击率预测的可学习特征生成因子分解机

作者：Qingyun She,Zhiqiang Wang,Junlin Zhang 机构：Sina Weibo Corp, Beijing, China 链接：https://arxiv.org/abs/2107.12024 摘要：点击率预测在个性化广告和推荐系统中占有重要地位。尽管近年来已经提出了许多模型，如FM、FFM和DeepFM，但是在许多应用中，特征工程仍然是提高模型性能的一个非常重要的方法，因为使用原始特征很少能得到最优的结果。例如，通常通过增加一个新的特征来将连续特征转化为幂形式，使其容易形成特征的非线性函数。然而，这种特征工程在很大程度上依赖于人们的经验，既费时又费工。另一方面，简洁的CTR模型具有快速的在线服务速度和良好的模型性能，对于许多实际应用是至关重要的。本文提出了一种基于FM的LeafFM模型，通过自动学习变换函数，在原有特征嵌入的基础上生成新的特征。根据原始特征与生成特征相结合的不同策略，设计了三个具体的叶FM模型。在三个真实数据集上进行了大量的实验，结果表明Leaf-FM模型的性能明显优于标准FMs模型。与FFMs相比，Leaf-FM能以更少的参数获得更好的性能。在Avazu和恶意软件数据集中，add-version-Leaf-FM与一些基于深度学习的模型（如DNN和AutoInt）的性能相当。作为一种改进的FM模型，Leaf-FM在在线服务阶段具有与FM相同的计算复杂度，这意味着Leaf-FM具有更好的性能和较高的计算效率，可以应用于许多工业应用中。摘要：Click-through rate (CTR) prediction plays important role in personalized advertising and recommender systems. Though many models have been proposed such as FM, FFM and DeepFM in recent years, feature engineering is still a very important way to improve the model performance in many applications because using raw features can rarely lead to optimal results. For example, the continuous features are usually transformed to the power forms by adding a new feature to allow it to easily form non-linear functions of the feature. However, this kind of feature engineering heavily relies on peoples experience and it is both time consuming and labor consuming. On the other side, concise CTR model with both fast online serving speed and good model performance is critical for many real life applications. In this paper, we propose LeafFM model based on FM to generate new features from the original feature embedding by learning the transformation functions automatically. We also design three concrete Leaf-FM models according to the different strategies of combing the original and the generated features. Extensive experiments are conducted on three real-world datasets and the results show Leaf-FM model outperforms standard FMs by a large margin. Compared with FFMs, Leaf-FM can achieve significantly better performance with much less parameters. In Avazu and Malware dataset, add version Leaf-FM achieves comparable performance with some deep learning based models such as DNN and AutoInt. As an improved FM model, Leaf-FM has the same computation complexity with FM in online serving phase and it means Leaf-FM is applicable in many industry applications because of its better performance and high computation efficiency.

【21】 Benign Adversarial Attack: Tricking Algorithm for Goodness 标题：良性对抗性攻击：行善的欺骗算法

作者：Xian Zhao,Jiaming Zhang,Zhiyu Lin,Jitao Sang 机构：Beijing Jiaotong University, Beijing, China 备注：Preprint. Under review 链接：https://arxiv.org/abs/2107.11986 摘要：尽管机器学习算法在许多领域得到了成功的应用，但它仍然存在着一些臭名昭著的问题，比如易受对手的攻击。除了在对抗攻击和防御之间陷入猫捉老鼠的游戏中，本文还提供了另一种视角来考虑对抗实例，并探讨我们能否在良性应用中利用它。我们首先提出了一种新的视觉信息分类方法，包括任务相关性和语义取向。对抗性实例的出现是由于算法利用了与任务相关的非语义信息。在经典的机器学习机制中，任务相关的非语义信息在很大程度上被忽略了，但它有三个有趣的特点：（1）算法独有的；（2）反映出共同的弱点；（3）作为特征可利用。受此启发，我们提出了一种称为良性对抗攻击的勇敢的新思想，从三个方向利用对抗性例子：（1）对抗性图灵测试，（2）拒绝恶意算法，（3）对抗性数据扩充。每一个方向都有动机阐述、理由分析和原型应用程序来展示其潜力。摘要：In spite of the successful application in many fields, machine learning algorithms today suffer from notorious problems like vulnerability to adversarial examples. Beyond falling into the cat-and-mouse game between adversarial attack and defense, this paper provides alternative perspective to consider adversarial example and explore whether we can exploit it in benign applications. We first propose a novel taxonomy of visual information along task-relevance and semantic-orientation. The emergence of adversarial example is attributed to algorithm's utilization of task-relevant non-semantic information. While largely ignored in classical machine learning mechanisms, task-relevant non-semantic information enjoys three interesting characteristics as (1) exclusive to algorithm, (2) reflecting common weakness, and (3) utilizable as features. Inspired by this, we present brave new idea called benign adversarial attack to exploit adversarial examples for goodness in three directions: (1) adversarial Turing test, (2) rejecting malicious algorithm, and (3) adversarial data augmentation. Each direction is positioned with motivation elaboration, justification analysis and prototype applications to showcase its potential.

【22】 Trade When Opportunity Comes: Price Movement Forecasting via Locality-Aware Attention and Adaptive Refined Labeling 标题：机会来临时的交易：通过位置感知关注和自适应精细化标签预测价格走势

作者：Liang Zeng,Lei Wang,Hui Niu,Jian Li,Ruchen Zhang,Zhonghao Dai,Dewei Zhu,Ling Wang 机构：Institute for Interdisciplinary Information Sciences (IIIS), Tsinghua University, China, Huatai Securities Co., Ltd, China 链接：https://arxiv.org/abs/2107.11972 摘要：价格变动预测是根据当前市场状况和其他相关信息，对金融资产未来走势进行预测。近年来，机器学习（ML）方法在价格运动预测中得到了越来越广泛的应用，并取得了很好的效果。大多数现有的ML解决方案将预测问题描述为整个训练数据集中的分类（预测方向）或回归（预测回报）问题。然而，由于金融数据的极低信噪比和随机性，良好的交易机会极为稀缺。因此，如果不仔细选择有潜在收益的样本，这种ML方法很容易捕获噪声的模式而不是真实的信号。为了解决上述问题，我们提出了一个新的框架LARA（位置感知注意和自适应精细标记），它包含以下三个组成部分：1）位置感知注意通过关注样本的标签信息，自动提取出有潜在收益的样本，从而在这些样本上构造更精确的分类器。2）自适应细化标签进一步迭代细化标签，降低样本噪声。3）借助于度量学习技术，位置感知注意力可以享受特定于任务的距离度量，并以一种更有效的方式将注意力分布在潜在盈利的样本上。为了验证我们的方法，我们在三个真实的金融市场上进行了综合实验：etf、中国A股股票市场和加密货币市场。在Qlib平台上，与时间序列分析方法和一组基于机器学习的竞争对手相比，LARA取得了优异的性能。大量的烧蚀研究和实验表明，劳拉确实抓住了更可靠的交易机会。摘要：Price movement forecasting aims at predicting the future trends of financial assets based on the current market conditions and other relevant information. Recently, machine learning(ML) methods have become increasingly popular and achieved promising results for price movement forecasting in both academia and industry. Most existing ML solutions formulate the forecasting problem as a classification(to predict the direction) or a regression(to predict the return) problem in the entire set of training data. However, due to the extremely low signal-to-noise ratio and stochastic nature of financial data, good trading opportunities are extremely scarce. As a result, without careful selection of potentially profitable samples, such ML methods are prone to capture the patterns of noises instead of real signals. To address the above issues, we propose a novel framework-LARA(Locality-Aware Attention and Adaptive Refined Labeling), which contains the following three components: 1)Locality-aware attention automatically extracts the potentially profitable samples by attending to their label information in order to construct a more accurate classifier on these selected samples. 2)Adaptive refined labeling further iteratively refines the labels, alleviating the noise of samples. 3)Equipped with metric learning techniques, Locality-aware attention enjoys task-specific distance metrics and distributes attention on potentially profitable samples in a more effective way. To validate our method, we conduct comprehensive experiments on three real-world financial markets: ETFs, the China's A-share stock market, and the cryptocurrency market. LARA achieves superior performance compared with the time-series analysis methods and a set of machine learning based competitors on the Qlib platform. Extensive ablation studies and experiments demonstrate that LARA indeed captures more reliable trading opportunities.

【23】 Playtesting: What is Beyond Personas 标题：游戏测试：人物角色之外的是什么

作者：Sinan Ariyurek,Elif Surer,Aysu Betin-Can 机构：Graduate School of Informatics, Middle East Technical University, Ankara, Turkey 链接：https://arxiv.org/abs/2107.11965 摘要：游戏测试是游戏设计过程中必不可少的一步。游戏设计者利用游戏测试的反馈来改进他们的设计。游戏设计者可以使用程序角色来自动化游戏测试过程。在本文中，我们提出了两种改进自动播放测试的方法。首先，我们提出了一个基于目标的角色模型，我们称之为开发角色——开发角色提出了一个动态角色模型，而当前的角色模型是静态的。游戏设计者可以使用开发中的角色来模拟玩家在玩游戏时所经历的变化。此外，人类游戏测试员知道她以前测试过哪些路径，在随后的测试中，她可能会测试不同的路径。然而，RL代理忽略了先前生成的轨迹。我们提出了一种新的方法，帮助强化学习（RL）代理产生不同于以往的轨迹。我们将此方法称为替代路径查找器（APF）。我们提出了一个通用的APF框架，可以应用于所有RL代理。APF是用以前的轨迹训练的，APF能区分新的状态和相似的状态。我们使用通用视频游戏人工智能（GVG-AI）和VizDoom框架来测试我们提出的方法。在实验中我们使用了近端策略优化（PPO）RL代理。首先，我们证明了开发中的角色生成的playtest数据不能使用过程角色生成。其次，我们介绍了使用APF找到的替代路径。我们证明了APF惩罚先前的路径并奖励不同的路径。摘要：Playtesting is an essential step in the game design process. Game designers use the feedback from playtests to refine their design. Game designers may employ procedural personas to automate the playtesting process. In this paper, we present two approaches to improve automated playtesting. First, we propose a goal-based persona model, which we call developing persona -- developing persona proposes a dynamic persona model, whereas the current persona models are static. Game designers can use the developing persona to model the changes that a player undergoes while playing a game. Additionally, a human playtester knows which paths she has tested before, and during the consequent tests, she may test different paths. However, RL agents disregard the previously generated trajectories. We propose a novel methodology that helps Reinforcement Learning (RL) agents to generate distinct trajectories than the previous trajectories. We refer to this methodology as Alternative Path Finder (APF). We present a generic APF framework that can be applied to all RL agents. APF is trained with the previous trajectories, and APF distinguishes the novel states from similar states. We use the General Video Game Artificial Intelligence (GVG-AI) and VizDoom frameworks to test our proposed methodologies. We use Proximal Policy Optimization (PPO) RL agent during experiments. First, we show that the playtest data generated by the developing persona cannot be generated using the procedural personas. Second, we present the alternative paths found using APF. We show that the APF penalizes the previous paths and rewards the distinct paths.

【24】 Towards Propagation Uncertainty: Edge-enhanced Bayesian Graph Convolutional Networks for Rumor Detection 标题：走向传播不确定性：用于谣言检测的边增强贝叶斯图卷积网络

作者：Lingwei Wei,Dou Hu,Wei Zhou,Zhaojuan Yue,Songlin Hu 机构： Institute of Information Engineering, Chinese Academy of Sciences, National Computer System Engineering Research Institute of China, Computer Network Information Center, Chinese Academy of Sciences 备注：Accepted by ACL 2021 main conference 链接：https://arxiv.org/abs/2107.11934 摘要：在社交媒体上发现谣言是一项非常重要的任务，对经济、公共卫生等都有重要意义。以前的工作通常从文本和传播结构中捕捉有效的特征。然而，由于谣言制造者的狡猾和传播数据收集的有限性，传播结构中不可靠关系引起的不确定性是常见的，也是不可避免的。大多数方法忽略了它，可能严重限制了特征的学习。针对这一问题，本文首次尝试探讨传播不确定性在谣言检测中的应用。具体地说，我们提出了一种新的边缘增强贝叶斯图卷积网络（EBGCN）来捕获鲁棒的结构特征。该模型采用贝叶斯方法自适应地重新考虑潜在关系的可靠性。此外，我们还设计了一个新的边缘一致性训练框架，通过加强关系的一致性来优化模型。在三个公共基准数据集上的实验结果表明，该模型在谣言检测和早期谣言检测任务上均优于基线方法。摘要：Detecting rumors on social media is a very critical task with significant implications to the economy, public health, etc. Previous works generally capture effective features from texts and the propagation structure. However, the uncertainty caused by unreliable relations in the propagation structure is common and inevitable due to wily rumor producers and the limited collection of spread data. Most approaches neglect it and may seriously limit the learning of features. Towards this issue, this paper makes the first attempt to explore propagation uncertainty for rumor detection. Specifically, we propose a novel Edge-enhanced Bayesian Graph Convolutional Network (EBGCN) to capture robust structural features. The model adaptively rethinks the reliability of latent relations by adopting a Bayesian approach. Besides, we design a new edge-wise consistency training framework to optimize the model by enforcing consistency on relations. Experiments on three public benchmark datasets demonstrate that the proposed model achieves better performance than baseline methods on both rumor detection and early rumor detection tasks.

【25】 On Blame Attribution for Accountable Multi-Agent Sequential Decision Making 标题：责任多智能体序贯决策的责难归因研究

作者：Stelios Triantafyllou,Adish Singla,Goran Radanovic 机构：MPI-SWS 链接：https://arxiv.org/abs/2107.11927 摘要：责任归因是责任决策的一个关键方面，因为它提供了一种方法来量化代理人对决策结果的责任。本文研究了合作多智能体序列决策中的责任归因问题。作为一种特殊的利益背景，本文着重研究了多智能体马尔可夫决策过程（MMDP）形式化的合作决策问题，并分析了合作博弈理论中的各种非责任归因方法。我们在兴趣的背景下形式化了责备归因的期望性质，并分析了这些性质与所研究的责备归因方法之间的关系。有趣的是，我们发现一些著名的责备归因方法，如Shapley值，并不具有绩效激励作用，而另一些方法，如Banzhaf指数，则可能过度责备代理人。为了缓解这些价值错位和公平性问题，我们引入了一种新的责备归因方法，该方法在满足的属性集合中是独特的，它权衡了对上述属性的解释力（通过欠责备代理）。我们进一步展示了如何解释代理决策策略的不确定性，并通过实验：a）验证了所研究的责备归因方法的定性性质，b）分析了它们对不确定性的鲁棒性。摘要：Blame attribution is one of the key aspects of accountable decision making, as it provides means to quantify the responsibility of an agent for a decision making outcome. In this paper, we study blame attribution in the context of cooperative multi-agent sequential decision making. As a particular setting of interest, we focus on cooperative decision making formalized by Multi-Agent Markov Decision Processes (MMDP), and we analyze different blame attribution methods derived from or inspired by existing concepts in cooperative game theory. We formalize desirable properties of blame attribution in the setting of interest, and we analyze the relationship between these properties and the studied blame attribution methods. Interestingly, we show that some of the well known blame attribution methods, such as Shapley value, are not performance-incentivizing, while others, such as Banzhaf index, may over-blame agents. To mitigate these value misalignment and fairness issues, we introduce a novel blame attribution method, unique in the set of properties it satisfies, which trade-offs explanatory power (by under-blaming agents) for the aforementioned properties. We further show how to account for uncertainty about agents' decision making policies, and we experimentally: a) validate the qualitative properties of the studied blame attribution methods, and b) analyze their robustness to uncertainty.

【26】 Measuring Ethics in AI with AI: A Methodology and Dataset Construction 标题：用人工智能测量人工智能中的伦理：一种方法论和数据集构建

作者：Pedro H. C. Avelar,Rafael B. Audibert,Anderson R. Tavares,Luís C. Lamb 机构：Universidade Federal do Rio Grande do Sul 链接：https://arxiv.org/abs/2107.11913 摘要：最近，在人工智能中使用合理的度量和量度已经成为学术界、政府和工业界感兴趣的课题。衡量不同现象的努力在人工智能界得到了广泛的关注，一些有影响力的实地报告和政策文件的发表就说明了这一点。这些指标的目的是帮助决策者了解人工智能和机器学习领域的关键进展的快速发展和影响。在这篇论文中，我们建议使用人工智能技术的这些新发现的能力来增强我们的人工智能测量能力。我们通过训练一个模型来对与道德问题和关注相关的出版物进行分类。在我们的方法中，我们使用一个专家，手工整理的数据集作为训练集，然后评估一大组研究论文。最后，我们强调了人工智能度量的含义，特别是它们对开发可信和公平的人工智能工具和技术的贡献。关键词：人工智能伦理；AI公平；AI测量。计算机科学中的伦理学。摘要：Recently, the use of sound measures and metrics in Artificial Intelligence has become the subject of interest of academia, government, and industry. Efforts towards measuring different phenomena have gained traction in the AI community, as illustrated by the publication of several influential field reports and policy documents. These metrics are designed to help decision takers to inform themselves about the fast-moving and impacting influences of key advances in Artificial Intelligence in general and Machine Learning in particular. In this paper we propose to use such newfound capabilities of AI technologies to augment our AI measuring capabilities. We do so by training a model to classify publications related to ethical issues and concerns. In our methodology we use an expert, manually curated dataset as the training set and then evaluate a large set of research papers. Finally, we highlight the implications of AI metrics, in particular their contribution towards developing trustful and fair AI-based tools and technologies. Keywords: AI Ethics; AI Fairness; AI Measurement. Ethics in Computer Science.

【27】 Transferable Dialogue Systems and User Simulators 标题：可转移对话系统和用户模拟器

作者：Bo-Hsiang Tseng,Yinpei Dai,Florian Kreyssig,Bill Byrne 机构：†Engineering Department, University of Cambridge, UK, ‡Alibaba Group 备注：Accepted by ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2107.11904 摘要：训练对话系统的困难之一是缺乏训练数据。我们探讨了通过对话系统和用户模拟器之间的交互来创建对话数据的可能性。我们的目标是开发一个建模框架，可以通过两个代理之间的自我游戏来整合新的对话场景。在这个框架中，我们首先在一组源域对话上对两个agent进行预训练，使它们能够通过自然语言进行对话。通过对少量的目标域数据进行进一步的微调，智能体可以继续进行交互，目的是通过具有结构化奖励函数的强化学习来改善其行为。在MultiWOZ数据集上的实验中，研究了两个实际的迁移学习问题：1）域自适应问题和2）单域到多域迁移问题。我们证明了所提出的框架是非常有效的引导性能的两个代理人在转移学习。我们还表明，我们的方法可以提高对话系统在完整数据集上的性能。摘要：One of the difficulties in training dialogue systems is the lack of training data. We explore the possibility of creating dialogue data through the interaction between a dialogue system and a user simulator. Our goal is to develop a modelling framework that can incorporate new dialogue scenarios through self-play between the two agents. In this framework, we first pre-train the two agents on a collection of source domain dialogues, which equips the agents to converse with each other via natural language. With further fine-tuning on a small amount of target domain data, the agents continue to interact with the aim of improving their behaviors using reinforcement learning with structured reward functions. In experiments on the MultiWOZ dataset, two practical transfer learning problems are investigated: 1) domain adaptation and 2) single-to-multiple domain transfer. We demonstrate that the proposed framework is highly effective in bootstrapping the performance of the two agents in transfer learning. We also show that our method leads to improvements in dialogue system performance on complete datasets.

【28】 Hybrid Autoregressive Solver for Scalable Abductive Natural Language Inference 标题：基于混合自回归求解器的可扩展外推自然语言推理

作者：Marco Valentino,Mokanarangan Thayaparan,Deborah Ferreira,André Freitas 机构：Department of Computer Science, University of Manchester, United Kingdom†, Idiap Research Institute, Switzerland‡ 链接：https://arxiv.org/abs/2107.11879 摘要：重新生成科学问题的自然语言解释对于评估复杂的多跳和诱因推理能力是一项具有挑战性的任务。在这种设置下，当被采用为交叉编码器架构时，经过人类注释解释训练的Transformer可以达到最先进的性能。然而，尽管人们对所构建的解释的质量给予了很大的关注，但在规模上进行诱因推理的问题仍有待研究。由于本质上不可扩展，交叉编码器架构范式不适合在大规模事实库上进行有效的多跳推理。为了最大限度地提高精度和推理时间，我们提出了一种混合诱因解算器，它利用解释中的显式模式，将稠密的双编码器与解释力的稀疏模型进行自回归组合。我们的实验表明，所提出的框架可以达到与最先进的交叉编码器相当的性能，同时速度快约50$倍，可扩展到数百万个事实的语料库。此外，我们研究了杂交对语义漂移和科学问答的影响，结果表明，杂交可以提高解释的质量，并有助于提高下游推理的性能。摘要：Regenerating natural language explanations for science questions is a challenging task for evaluating complex multi-hop and abductive inference capabilities. In this setting, Transformers trained on human-annotated explanations achieve state-of-the-art performance when adopted as cross-encoder architectures. However, while much attention has been devoted to the quality of the constructed explanations, the problem of performing abductive inference at scale is still under-studied. As intrinsically not scalable, the cross-encoder architectural paradigm is not suitable for efficient multi-hop inference on massive facts banks. To maximise both accuracy and inference time, we propose a hybrid abductive solver that autoregressively combines a dense bi-encoder with a sparse model of explanatory power, computed leveraging explicit patterns in the explanations. Our experiments demonstrate that the proposed framework can achieve performance comparable with the state-of-the-art cross-encoder while being $\approx 50$ times faster and scalable to corpora of millions of facts. Moreover, we study the impact of the hybridisation on semantic drift and science question answering without additional training, showing that it boosts the quality of the explanations and contributes to improved downstream inference performance.

【29】 On-Device Content Moderation 标题：设备上的内容审核

作者：Anchal Pandey,Sukumar Moharana,Debi Prasanna Mohanty,Archit Panwar,Dewang Agarwal,Siva Prasad Thota 机构：On Device AI, Samsung R&D Bangalore, Bangalore, India 链接：https://arxiv.org/abs/2107.11845 摘要：随着互联网的出现，不安全工作（NSFW）的内容节制是当今的一个主要问题。由于智能手机现在已经成为亿万人日常生活的一部分，因此拥有一个能够检测并向用户建议手机上潜在NSFW内容的解决方案变得更加重要。本文提出了一种新的基于器件的NSFW图像检测方法。除了传统的色情图片内容节制，我们还包括了半裸体内容节制，因为它仍然是全国妇联在一个庞大的人口。我们策划了一个数据集，包括三大类，即裸体，半裸体和安全图像。我们已经创建了一个对象检测器和分类器，用于过滤裸体和半裸体内容。该解决方案提供了不安全的身体部位注释以及半裸体图像的识别。我们在几个公共数据集和自定义数据集上对我们提出的解决方案进行了广泛的测试。该模型在customNSFW16k数据集上的F1得分为0.91，查准率为95%，查全率为88%，在NPDI数据集上的MAP为0.92。此外，它在safeimage开放数据集上的平均假阳性率为0.002。摘要：With the advent of internet, not safe for work(NSFW) content moderation is a major problem today. Since,smartphones are now part of daily life of billions of people,it becomes even more important to have a solution which coulddetect and suggest user about potential NSFW content present ontheir phone. In this paper we present a novel on-device solutionfor detecting NSFW images. In addition to conventional porno-graphic content moderation, we have also included semi-nudecontent moderation as it is still NSFW in a large demography.We have curated a dataset comprising of three major categories,namely nude, semi-nude and safe images. We have created anensemble of object detector and classifier for filtering of nudeand semi-nude contents. The solution provides unsafe body partannotations along with identification of semi-nude images. Weextensively tested our proposed solution on several public datasetand also on our custom dataset. The model achieves F1 scoreof 0.91 with 95% precision and 88% recall on our customNSFW16k dataset and 0.92 MAP on NPDI dataset. Moreover itachieves average 0.002 false positive rate on a collection of safeimage open datasets.

【30】 A binary variant of gravitational search algorithm and its application to windfarm layout optimization problem 标题：重力搜索算法的二进制变体及其在风电场布局优化问题中的应用

作者：Susheel Kumar Joshi,Jagdish Chand Bansal 机构：Received: date Accepted: date 链接：https://arxiv.org/abs/2107.11844 摘要：在二进制搜索空间中，GSA框架存在停滞、多样性丢失、过早收敛和时间复杂度高等缺点。为了解决这些问题，本文提出了一种新的二元搜索算法，称之为嵌入引力常数的邻域新算法（BNAGGSA）。在BNAGGSA中，新的基于适应度距离的社会交互策略产生了一种自适应步长机制，通过该机制，agent可以根据当前的搜索需求，以最优步长向最优方向移动。在23个著名的基准测试问题上，将该算法与GSA的两个二进制变量进行了性能比较。实验结果和统计分析证明了BNAGGSA算法优于其他算法。此外，为了验证该算法在实际应用中的适用性，还考虑了一个风电场布局优化问题。以两个不同风场的两组不同风场数据为例进行了实验研究。摘要：In the binary search space, GSA framework encounters the shortcomings of stagnation, diversity loss, premature convergence and high time complexity. To address these issues, a novel binary variant of GSA called `A novel neighbourhood archives embedded gravitational constant in GSA for binary search space (BNAGGSA)' is proposed in this paper. In BNAGGSA, the novel fitness-distance based social interaction strategy produces a self-adaptive step size mechanism through which the agent moves towards the optimal direction with the optimal step size, as per its current search requirement. The performance of the proposed algorithm is compared with the two binary variants of GSA over 23 well-known benchmark test problems. The experimental results and statistical analyses prove the supremacy of BNAGGSA over the compared algorithms. Furthermore, to check the applicability of the proposed algorithm in solving real-world applications, a windfarm layout optimization problem is considered. Two case studies with two different wind data sets of two different wind sites is considered for experiments.

【31】 New Algebraic Normative Theories for Ethical and Legal Reasoning in the LogiKEy Framework 标题：LogiKEy框架中伦理和法律推理的新代数规范理论

作者：Ali Farjami 机构： University of LuxembourgAbstractTo design and engineer ethical and legal reasoners and responsible systems 链接：https://arxiv.org/abs/2107.11838 摘要：为了设计和设计伦理和法律推理者以及负责的系统，Benzm\“{u}ller，Parent和van der Torre引入了LogiKEy方法，该方法基于道义逻辑在经典高阶逻辑中的语义嵌入。本文用代数方法对LogiKEy道义逻辑和数据集进行了扩展。我们在布尔代数的基础上发展了规范推理的输入/输出操作理论。摘要：To design and engineer ethical and legal reasoners and responsible systems, Benzm\"{u}ller, Parent and van der Torre introduce LogiKEy methodology based on the semantical embedding of deontic logics into classic higher-order logic. In this paper, we considerably extend the LogiKEy deontic logics and dataset using an algebraic approach. We develop theory of input/output operations for normative reasoning on top of Boolean algebras.

【32】 Distributional Shifts in Automated Diabetic Retinopathy Screening 标题：糖尿病视网膜病变自动筛查中的分布偏移

作者：Jay Nandy,Wynne Hsu,Mong Li Lee 机构：School of Computing, National University of Singapore, Institute of Data Science, National University of Singapore 备注：Accepted at IEEE ICIP 2021 链接：https://arxiv.org/abs/2107.11822 摘要：在糖尿病视网膜病变（DR）筛查中，基于深度学习的模型可以自动检测视网膜图像是否“可参考”。然而，当输入图像分布偏离训练分布时，分类精度下降。此外，即使输入的不是视网膜图像，标准的DR分类器也会产生一个高度自信的预测，即该图像是“可参考的”。本文提出了一个基于Dirichlet先验网络的框架来解决这个问题。它利用了一个非分布（OOD）检测器模型和一个DR分类模型，通过识别OOD图像来提高泛化能力。在真实数据集上的实验表明，该框架能够消除未知的非视网膜图像，识别出分布移位的视网膜图像，便于人工干预。摘要：Deep learning-based models are developed to automatically detect if a retina image is `referable' in diabetic retinopathy (DR) screening. However, their classification accuracy degrades as the input images distributionally shift from their training distribution. Further, even if the input is not a retina image, a standard DR classifier produces a high confident prediction that the image is `referable'. Our paper presents a Dirichlet Prior Network-based framework to address this issue. It utilizes an out-of-distribution (OOD) detector model and a DR classification model to improve generalizability by identifying OOD images. Experiments on real-world datasets indicate that the proposed framework can eliminate the unknown non-retina images and identify the distributionally shifted retina images for human intervention.

【33】 Bangla sign language recognition using concatenated BdSL network 标题：基于级联BDSL网络的孟加拉手语识别

作者：Thasin Abedin,Khondokar S. S. Prottoy,Ayana Moshruba,Safayat Bin Hakim 机构：Department of Electrical and Electronic Engineering, Islamic University of Technology (IUT) 链接：https://arxiv.org/abs/2107.11818 摘要：手语是聋哑人和聋哑人交流的唯一媒介。因此，与大众的沟通对这个少数群体来说始终是一个挑战。特别是在孟加拉语手语（BdSL）中，有38个字母表，其中一些有几乎相同的符号。因此，在BdSL识别中，除了从传统的卷积神经网络（CNN）中提取视觉特征外，手的姿态也是一个重要的因素。本文提出了一种由CNN图像网络和姿态估计网络组成的级联BdSL网络结构。图像网络在获取视觉特征的同时，通过姿态估计网络获取手部关键点的相对位置，获得附加特征，以应对BdSL符号的复杂性。实验结果表明，该附加姿态估计网络的有效性。摘要：Sign language is the only medium of communication for the hearing impaired and the deaf and dumb community. Communication with the general mass is thus always a challenge for this minority group. Especially in Bangla sign language (BdSL), there are 38 alphabets with some having nearly identical symbols. As a result, in BdSL recognition, the posture of hand is an important factor in addition to visual features extracted from traditional Convolutional Neural Network (CNN). In this paper, a novel architecture "Concatenated BdSL Network" is proposed which consists of a CNN based image network and a pose estimation network. While the image network gets the visual features, the relative positions of hand keypoints are taken by the pose estimation network to obtain the additional features to deal with the complexity of the BdSL symbols. A score of 91.51% was achieved by this novel approach in test set and the effectiveness of the additional pose estimation network is suggested by the experimental results.

【34】 Go Wider Instead of Deeper 标题：走得更广，而不是更深

作者：Fuzhao Xue,Ziji Shi,Yuxuan Lou,Yong Liu,Yang You 机构：Department of Computer Science, National University of Singapore, Singapore 链接：https://arxiv.org/abs/2107.11817 摘要：Transformer最近在各种任务上取得了令人印象深刻的成果。为了进一步提高Transformer的有效性和效率，现有的工作有两个思路：（1）通过扩展到更多的可训练参数来扩大Transformer的范围(2）通过参数共享或模型随深度压缩而变浅。然而，当可用于训练的令牌较少时，较大的模型通常不能很好地扩展，并且当模型非常大时，需要高级并行。与原始Transformer模型相比，较小的模型通常由于表现功率的损失而获得较差的性能。在本文中，为了在可训练参数较少的情况下获得更好的性能，我们提出了一个框架来有效地部署可训练参数，方法是更广泛而不是更深。特别地，我们用混合专家（MoE）代替前馈网络（FFN），沿模型宽度进行缩放。然后，我们使用单独的层规范化跨Transformer块共享MoE层。这样的部署起到了转换各种语义表示的作用，使得模型的参数更为高效和有效。为了评估我们的框架，我们设计了WideNet并在ImageNet-1K上进行了评估。我们最好的模型比视觉变换器（ViT）高出1.46\%$，可训练参数为$0.72\倍。使用$0.46乘以$和$0.13乘以$参数，我们的WideNet仍然可以分别超过ViT和ViT MoE$0.83\%$和$2.08\%$。摘要：The transformer has recently achieved impressive results on various tasks. To further improve the effectiveness and efficiency of the transformer, there are two trains of thought among existing works: (1) going wider by scaling to more trainable parameters; (2) going shallower by parameter sharing or model compressing along with the depth. However, larger models usually do not scale well when fewer tokens are available to train, and advanced parallelisms are required when the model is extremely large. Smaller models usually achieve inferior performance compared to the original transformer model due to the loss of representation power. In this paper, to achieve better performance with fewer trainable parameters, we propose a framework to deploy trainable parameters efficiently, by going wider instead of deeper. Specially, we scale along model width by replacing feed-forward network (FFN) with mixture-of-experts (MoE). We then share the MoE layers across transformer blocks using individual layer normalization. Such deployment plays the role to transform various semantic representations, which makes the model more parameter-efficient and effective. To evaluate our framework, we design WideNet and evaluate it on ImageNet-1K. Our best model outperforms Vision Transformer (ViT) by $1.46\%$ with $0.72 \times$ trainable parameters. Using $0.46 \times$ and $0.13 \times$ parameters, our WideNet can still surpass ViT and ViT-MoE by $0.83\%$ and $2.08\%$, respectively.

【35】 Reinforced Imitation Learning by Free Energy Principle 标题：利用自由能原理强化模仿学习

作者：Ryoya Ogishima,Izumi Karino,Yasuo Kuniyoshi 机构： Therefore 1Graduate School of Information Science and Technology, TheUniversity of Tokyo 链接：https://arxiv.org/abs/2107.11811 摘要：强化学习（RL）需要大量的探索，特别是在稀疏奖励环境下。模仿学习（IL）可以从专家的演示中学习而不需要探索，但它永远不会超过专家的表现，而且很容易在演示和执行之间发生分布转换。本文基于自由能原理（FEP），从根本上统一了RL和IL。FEP是一个统一的贝叶斯理论的大脑，解释知觉，行动和模型学习的共同基本原则。我们提出了FEP的一个理论扩展，并推导了一个算法，在该算法中，一个agent学习内部化专家演示的世界模型，同时使用该模型来推断当前和未来的状态和行为，以获得最大的回报。因此，该算法通过部分模仿专家并以无缝方式最大化其回报来降低勘探成本，从而比次优专家具有更高的性能。实验结果表明，该方法在稀疏奖赏环境下的视觉控制任务中具有良好的应用前景。摘要：Reinforcement Learning (RL) requires a large amount of exploration especially in sparse-reward settings. Imitation Learning (IL) can learn from expert demonstrations without exploration, but it never exceeds the expert's performance and is also vulnerable to distributional shift between demonstration and execution. In this paper, we radically unify RL and IL based on Free Energy Principle (FEP). FEP is a unified Bayesian theory of the brain that explains perception, action and model learning by a common fundamental principle. We present a theoretical extension of FEP and derive an algorithm in which an agent learns the world model that internalizes expert demonstrations and at the same time uses the model to infer the current and future states and actions that maximize rewards. The algorithm thus reduces exploration costs by partially imitating experts as well as maximizing its return in a seamless way, resulting in a higher performance than the suboptimal expert. Our experimental results show that this approach is promising in visual control tasks especially in sparse-reward environments.

【36】 Character Spotting Using Machine Learning Techniques 标题：基于机器学习技术的字符定位

作者：P Preethi,Hrishikesh Viswanath 机构：Department of Computer Science and Engineering, PES University 链接：https://arxiv.org/abs/2107.11795 摘要：这项工作提出了一个机器学习算法的比较，实现了分割字符的文本作为一个图像。这些算法设计用于处理文本未按组织方式对齐的降级文档。研究了利用支持向量机、K近邻算法和编码网络进行字符定位的方法。字符定位是通过选择空白区域从文本流中提取潜在字符。摘要：This work presents a comparison of machine learning algorithms that are implemented to segment the characters of text presented as an image. The algorithms are designed to work on degraded documents with text that is not aligned in an organized fashion. The paper investigates the use of Support Vector Machines, K-Nearest Neighbor algorithm and an Encoder Network to perform the operation of character spotting. Character Spotting involves extracting potential characters from a stream of text by selecting regions bound by white space.

【37】 ROD: Reception-aware Online Distillation for Sparse Graphs 标题：ROD：稀疏图的接收感知在线蒸馏

作者：Wentao Zhang,Yuezihan Jiang,Yang Li,Zeang Sheng,Yu Shen,Xupeng Miao,Liang Wang,Zhi Yang,Bin Cui 机构：†School of EECS & Key Laboratory of High Confidence Software Technologies, Peking University §Center for Data, Science, Peking University & National Engineering Laboratory for Big Data Analysis and Applications ‡Alibaba Group 链接：https://arxiv.org/abs/2107.11789 摘要：图神经网络（GNNs）广泛应用于节点分类、链路预测、节点聚类等基于图的任务中。然而，GNNs的性能优势主要来自于对图的边缘进行特征传播和平滑，因此需要足够的连通性和标签信息来进行有效的传播。不幸的是，现实世界中的许多网络在边缘和标签方面都是稀疏的，导致GNNs的性能处于次优状态。最近人们对这个稀疏问题的兴趣集中在自训练方法上，这种方法用伪标签扩展监督信号。然而，由于伪标号的数量和质量都不尽如人意，自学习方法本身就不能充分发挥细化稀疏图学习性能的潜力。在本文中，我们提出了一种新的接收感知稀疏图学习的在线知识提取方法ROD。我们为ROD设计了三种监督信号：多尺度接收感知图形知识、基于任务的监督和丰富的提炼知识，允许以同伴教学方式进行在线知识转移。为了提取隐藏在多尺度接收域中的知识，ROD明确要求单个学生模型保留不同层次的局部信息。对于一个给定的任务，每个学生根据自己的接受量表知识进行预测，同时结合多量表知识动态建立一个强大的教师。我们的方法已经在9个数据集和各种基于图的任务上进行了广泛的评估，包括节点分类、链接预测和节点聚类。结果表明，ROD算法具有良好的性能，对图的稀疏性具有较强的鲁棒性。摘要：Graph neural networks (GNNs) have been widely used in many graph-based tasks such as node classification, link prediction, and node clustering. However, GNNs gain their performance benefits mainly from performing the feature propagation and smoothing across the edges of the graph, thus requiring sufficient connectivity and label information for effective propagation. Unfortunately, many real-world networks are sparse in terms of both edges and labels, leading to sub-optimal performance of GNNs. Recent interest in this sparse problem has focused on the self-training approach, which expands supervised signals with pseudo labels. Nevertheless, the self-training approach inherently cannot realize the full potential of refining the learning performance on sparse graphs due to the unsatisfactory quality and quantity of pseudo labels. In this paper, we propose ROD, a novel reception-aware online knowledge distillation approach for sparse graph learning. We design three supervision signals for ROD: multi-scale reception-aware graph knowledge, task-based supervision, and rich distilled knowledge, allowing online knowledge transfer in a peer-teaching manner. To extract knowledge concealed in the multi-scale reception fields, ROD explicitly requires individual student models to preserve different levels of locality information. For a given task, each student would predict based on its reception-scale knowledge, while simultaneously a strong teacher is established on-the-fly by combining multi-scale knowledge. Our approach has been extensively evaluated on 9 datasets and a variety of graph-based tasks, including node classification, link prediction, and node clustering. The result demonstrates that ROD achieves state-of-art performance and is more robust for the graph sparsity.

【38】 Learn to Focus: Hierarchical Dynamic Copy Network for Dialogue State Tracking 标题：学会聚焦：用于对话状态跟踪的分层动态复制网络

作者：Linhao Zhang,Houfeng Wang 机构：MOE Key Lab of Computational Linguistics, Peking University, Beijing, China 链接：https://arxiv.org/abs/2107.11778 摘要：近年来，研究者们探索了利用编解码框架来解决面向任务的对话系统中的一个关键组成部分——对话状态跟踪问题。然而，他们认为多回合对话是一个平淡的序列，当序列很长时，他们没有把注意力集中在有用的信息上。在本文中，我们提出了一个分层动态复制网络（HDCN），以便于关注信息量最大的回合，从而更容易从对话上下文中提取时隙值。在编解码框架的基础上，我们采用分层复制的方法，在字级和话轮级计算两个注意级别，然后对其进行重新规范化，得到最终的副本分布。使用焦点损失项来鼓励模型将最高的回合水平注意力权重分配给信息量最大的回合。实验结果表明，该模型在multiwoz2.1数据集上的联合精度达到46.76%。摘要：Recently, researchers have explored using the encoder-decoder framework to tackle dialogue state tracking (DST), which is a key component of task-oriented dialogue systems. However, they regard a multi-turn dialogue as a flat sequence, failing to focus on useful information when the sequence is long. In this paper, we propose a Hierarchical Dynamic Copy Network (HDCN) to facilitate focusing on the most informative turn, making it easier to extract slot values from the dialogue context. Based on the encoder-decoder framework, we adopt a hierarchical copy approach that calculates two levels of attention at the word- and turn-level, which are then renormalized to obtain the final copy distribution. A focus loss term is employed to encourage the model to assign the highest turn-level attention weight to the most informative turn. Experimental results show that our model achieves 46.76% joint accuracy on the MultiWOZ 2.1 dataset.

【39】 A Joint and Domain-Adaptive Approach to Spoken Language Understanding 标题：一种联合的、领域自适应的口语理解方法

作者：Linhao Zhang,Yu Shi,Linjun Shou,Ming Gong,Houfeng Wang,Michael Zeng 机构：MOE Key Lab of Computational Linguistics, Peking University, Microsoft 链接：https://arxiv.org/abs/2107.11768 摘要：口语理解（SLU）由两个子任务组成：意图检测（ID）和时隙填充（SF）。关于SLU的研究有两条线。一个是联合处理这两个子任务以提高其预测精度，另一个则侧重于其中一个子任务的域适应能力。在本文中，我们尝试将这两条研究路线连接起来，并提出一种联合域自适应的SLU方法。我们将SLU描述为一个约束生成任务，并利用基于领域特定本体的动态词汇表。我们在ASMixed和MTOD数据集上进行了实验，取得了与以前最先进的联合模型相比较的性能。此外，结果表明，我们的联合模型可以有效地适应一个新的领域。摘要：Spoken Language Understanding (SLU) is composed of two subtasks: intent detection (ID) and slot filling (SF). There are two lines of research on SLU. One jointly tackles these two subtasks to improve their prediction accuracy, and the other focuses on the domain-adaptation ability of one of the subtasks. In this paper, we attempt to bridge these two lines of research and propose a joint and domain adaptive approach to SLU. We formulate SLU as a constrained generation task and utilize a dynamic vocabulary based on domain-specific ontology. We conduct experiments on the ASMixed and MTOD datasets and achieve competitive performance with previous state-of-the-art joint models. Besides, results show that our joint model can be effectively adapted to a new domain.

【40】 DR2L: Surfacing Corner Cases to Robustify Autonomous Driving via Domain Randomization Reinforcement Learning 标题：DR2L：基于领域随机化强化学习的机动车自动驾驶表面化

作者：Haoyi Niu,Jianming Hu,Zheyu Cui,Yi Zhang 机构：Department of, AutomationTsinghua, University, Beijing, China, , edu.cn 备注：8 pages, 7 figures 链接：https://arxiv.org/abs/2107.11762 摘要：在深度强化学习（DeepRL）自主驾驶的背景下，如何尽可能有效和彻底地探索弯道案例一直是人们关注的焦点之一。用模拟数据进行训练比用真实数据进行训练成本低、危险性小，但由于参数分布的不一致性和模拟器中系统建模的不正确性，往往导致不可避免的Sim2real缺口，这可能是NEW性能不佳的原因，模拟器难以产生的异常和危险案例。领域随机化（DR）是一种可以在很少或没有真实数据的情况下弥补这一差距的方法。因此，本研究提出了一个对抗性模型，以使在模拟中训练的基于DeepRL的自主车辆能够逐渐地在较难的事件中进行表面处理，从而使模型能够很容易地转移到现实世界中。摘要：How to explore corner cases as efficiently and thoroughly as possible has long been one of the top concerns in the context of deep reinforcement learning (DeepRL) autonomous driving. Training with simulated data is less costly and dangerous than utilizing real-world data, but the inconsistency of parameter distribution and the incorrect system modeling in simulators always lead to an inevitable Sim2real gap, which probably accounts for the underperformance in novel, anomalous and risky cases that simulators can hardly generate. Domain Randomization(DR) is a methodology that can bridge this gap with little or no real-world data. Consequently, in this research, an adversarial model is put forward to robustify DeepRL-based autonomous vehicles trained in simulation to gradually surfacing harder events, so that the models could readily transfer to the real world.

【41】 Learning Risk-aware Costmaps for Traversability in Challenging Environments 标题：学习具有风险意识的成本图，以便在具有挑战性的环境中实现可旅行性

作者：David D. Fan,Ali-akbar Agha-mohammadi,Evangelos A. Theodorou 机构： often computing traversabilitymeans calculating worst-case bounds on the uncertainty of 1Institute for Robotics and Intelligent Machines, Georgia Institute ofTechnology, California Institute of Technology 链接：https://arxiv.org/abs/2107.11722 摘要：在未知和非结构化环境中，机器人自主探索和导航的主要挑战之一是确定机器人可以或不能安全移动的位置。这种确定的一个重要困难来源是随机性和不确定性，来自定位误差、传感器稀疏性和噪声、难以建模的机器人-地面相互作用以及对车辆运动的干扰。解决这个问题的经典方法依赖于对周围地形的几何分析，这很容易产生建模错误，并且计算成本很高。此外，对不确定可遍历性代价的分布进行建模是一项困难的任务，由于上述各种误差源的存在，使得建模变得更加复杂。在这项工作中，我们采取原则性的学习方法来解决这个问题。我们介绍了一个神经网络结构的鲁棒学习分布的遍历性成本。由于我们的动机是保护机器人的生命，因此我们从学习尾部风险的角度来解决这个学习问题，即条件风险值（CVaR）。我们证明，这种方法可靠地学习期望的尾部风险给定一个期望的概率风险阈值在0和1之间，产生了一个遍历性成本图，它对异常值更稳健，更准确地捕捉尾部风险，并且与基线相比计算效率更高。我们通过一个步行机器人在充满挑战的非结构化环境中（包括废弃的地铁、石灰岩洞穴和熔岩管洞穴）进行数据采集，验证了我们的方法。摘要：One of the main challenges in autonomous robotic exploration and navigation in unknown and unstructured environments is determining where the robot can or cannot safely move. A significant source of difficulty in this determination arises from stochasticity and uncertainty, coming from localization error, sensor sparsity and noise, difficult-to-model robot-ground interactions, and disturbances to the motion of the vehicle. Classical approaches to this problem rely on geometric analysis of the surrounding terrain, which can be prone to modeling errors and can be computationally expensive. Moreover, modeling the distribution of uncertain traversability costs is a difficult task, compounded by the various error sources mentioned above. In this work, we take a principled learning approach to this problem. We introduce a neural network architecture for robustly learning the distribution of traversability costs. Because we are motivated by preserving the life of the robot, we tackle this learning problem from the perspective of learning tail-risks, i.e. the Conditional Value-at-Risk (CVaR). We show that this approach reliably learns the expected tail risk given a desired probability risk threshold between 0 and 1, producing a traversability costmap which is more robust to outliers, more accurately captures tail risks, and is more computationally efficient, when compared against baselines. We validate our method on data collected a legged robot navigating challenging, unstructured environments including an abandoned subway, limestone caves, and lava tube caves.

【42】 A Real Use Case of Semi-Supervised Learning for Mammogram Classification in a Local Clinic of Costa Rica 标题：半监督学习在哥斯达黎加当地诊所乳房X光片分类中的实际应用

作者：Saul Calderon-Ramirez,Diego Murillo-Hernandez,Kevin Rojas-Salazar,David Elizondo,Shengxiang Yang,Miguel Molina-Cabello 链接：https://arxiv.org/abs/2107.11696 摘要：应用基于深度学习的计算机辅助诊断系统对乳腺x线图像进行分类，有助于提高诊断的准确性、可靠性和成本。然而，训练一个深度学习模型需要大量的标记图像，这可能是昂贵的，因为需要时间和精力从临床医生获得。利用不同医院和诊所的数据建立了许多公开的数据集。然而，使用在这些数据集上训练的模型来处理从不同医院或诊所采集的图像可能会导致性能降低。这是由于数据集的分布不匹配，包括不同的患者群体和图像采集协议。标记数据的稀缺性也给迁移学习的应用带来了挑战，使用这些源数据集训练模型。在这项工作中，一个真实世界的情况下进行评估，其中一个新的目标数据集抽样从一个私人哥斯达黎加诊所使用，很少标签和严重不平衡的数据。使用两个流行的和公开可用的数据集（INbreast和CBIS-DDSM）作为源数据，在新的目标数据集上训练和测试模型。提出并评估了利用半监督深度学习方法MixMatch来利用目标数据集中的未标记数据。在测试中，模型的性能被广泛地测量，使用不同的度量来评估分类器在严重数据不平衡条件下的性能。结果表明，当使用稀少的标记观测值时，使用半监督深度学习结合微调可以提供有意义的优势。为了社区的利益，我们提供了新的数据集。摘要：The implementation of deep learning based computer aided diagnosis systems for the classification of mammogram images can help in improving the accuracy, reliability, and cost of diagnosing patients. However, training a deep learning model requires a considerable amount of labeled images, which can be expensive to obtain as time and effort from clinical practitioners is required. A number of publicly available datasets have been built with data from different hospitals and clinics. However, using models trained on these datasets for later work on images sampled from a different hospital or clinic might result in lower performance. This is due to the distribution mismatch of the datasets, which include different patient populations and image acquisition protocols. The scarcity of labeled data can also bring a challenge towards the application of transfer learning with models trained using these source datasets. In this work, a real world scenario is evaluated where a novel target dataset sampled from a private Costa Rican clinic is used, with few labels and heavily imbalanced data. The use of two popular and publicly available datasets (INbreast and CBIS-DDSM) as source data, to train and test the models on the novel target dataset, is evaluated. The use of the semi-supervised deep learning approach known as MixMatch, to leverage the usage of unlabeled data from the target dataset, is proposed and evaluated. In the tests, the performance of models is extensively measured, using different metrics to assess the performance of a classifier under heavy data imbalance conditions. It is shown that the use of semi-supervised deep learning combined with fine-tuning can provide a meaningful advantage when using scarce labeled observations. We make available the novel dataset for the benefit of the community.

【43】 Graph Convolutional Network with Generalized Factorized Bilinear Aggregation 标题：具有广义因子双线性聚集的图卷积网络

作者：Hao Zhu,Piotr Koniusz 机构：Australian National University and Data,CSIRO, Canberra, Australia 链接：https://arxiv.org/abs/2107.11666 摘要：尽管图卷积网络（GCN）在各种应用中显示了其强大的功能，但作为GCN最重要的组成部分，图卷积层仍然使用线性变换和简单的池化步骤。在本文中，我们提出了一种新的泛化因子化双线性（FB）层来模拟GCNs中的特征交互。FB执行两个矩阵向量乘法，即将权重矩阵与来自两侧的隐藏特征向量的外积相乘。然而，FB层由于隐藏表示的信道之间的相关性违反i.i.d.假设而遭受系数的二次数、过拟合和虚假相关性。因此，我们提出了一个紧凑的FB层通过定义一个家庭的总结运算符适用于二次项。我们分析了提出的池运算符并激励它们的使用。我们在多个数据集上的实验结果表明，GFB-GCN在文本分类方面与其他方法具有一定的竞争力。摘要：Although Graph Convolutional Networks (GCNs) have demonstrated their power in various applications, the graph convolutional layers, as the most important component of GCN, are still using linear transformations and a simple pooling step. In this paper, we propose a novel generalization of Factorized Bilinear (FB) layer to model the feature interactions in GCNs. FB performs two matrix-vector multiplications, that is, the weight matrix is multiplied with the outer product of the vector of hidden features from both sides. However, the FB layer suffers from the quadratic number of coefficients, overfitting and the spurious correlations due to correlations between channels of hidden representations that violate the i.i.d. assumption. Thus, we propose a compact FB layer by defining a family of summarizing operators applied over the quadratic term. We analyze proposed pooling operators and motivate their use. Our experimental results on multiple datasets demonstrate that the GFB-GCN is competitive with other methods for text classification.

【44】 Stress Test Evaluation of Biomedical Word Embeddings 标题：生物医学词嵌入的压力测试评价

作者：Vladimir Araujo,Andrés Carvallo,Carlos Aspillaga,Camilo Thorne,Denis Parra 机构：Pontificia Universidad Católica de Chile, Millennium Institute for Foundational Research on Data (IMFD), Elsevier 备注：Accepted paper BioNLP2021 链接：https://arxiv.org/abs/2107.11652 摘要：预训练词嵌入的成功推动了其在生物医学领域的应用，语境化嵌入在一些生物医学自然语言处理任务中取得了显著的效果。然而，目前还缺乏定量研究他们在严重“压力”情境下的行为。在这项工作中，我们用对抗性的例子系统地评估了三种语言模型——自动构建的测试，使我们能够检查模型的健壮性。我们提出了两种类型的压力情景集中在生物医学命名实体识别（NER）任务，一种是基于拼写错误的启发，另一种是基于医学术语同义词的使用。我们用三个基准进行的实验表明，除了暴露出它们的弱点和优点之外，原始模型的性能显著下降。最后，我们证明了对抗性训练可以提高模型的鲁棒性，甚至在某些情况下超过原始性能。摘要：The success of pretrained word embeddings has motivated their use in the biomedical domain, with contextualized embeddings yielding remarkable results in several biomedical NLP tasks. However, there is a lack of research on quantifying their behavior under severe "stress" scenarios. In this work, we systematically evaluate three language models with adversarial examples -- automatically constructed tests that allow us to examine how robust the models are. We propose two types of stress scenarios focused on the biomedical named entity recognition (NER) task, one inspired by spelling errors and another based on the use of synonyms for medical terms. Our experiments with three benchmarks show that the performance of the original models decreases considerably, in addition to revealing their weaknesses and strengths. Finally, we show that adversarial training causes the models to improve their robustness and even to exceed the original performance in some cases.

【45】 Clustering by Maximizing Mutual Information Across Views 标题：通过最大化视图间的交互信息进行聚类

作者：Kien Do,Truyen Tran,Svetha Venkatesh 机构：Applied Artificial Intelligence Institute (A,I,), Deakin University, Geelong, Australia 备注：Accepted at ICCV 2021 链接：https://arxiv.org/abs/2107.11635 摘要：提出了一种新的图像聚类框架，将联合表示学习和聚类相结合。我们的方法由共享同一主干网络的两个头组成，一个是“表示学习”头，一个是“聚类”头。“表示学习”头部在实例级捕获对象的细粒度模式，作为“聚类”头部提取粗粒度信息的线索，将对象分为多个簇。整个模型以端到端的方式训练，通过最小化应用于两个头部输出的两个面向样本的对比损失的加权和。为了保证“聚类”头对应的对比度损失是最优的，我们引入了一个新的评价函数“点积对数”。大量的实验结果表明，我们的方法在各种图像数据集上显著优于最先进的单阶段聚类方法，在CIFAR10/20、STL10和ImageNet狗上的准确率比最佳基线提高了约5-7%。此外，我们方法的“两阶段”变体在三个具有挑战性的ImageNet子集上也比基线获得更好的结果。摘要：We propose a novel framework for image clustering that incorporates joint representation learning and clustering. Our method consists of two heads that share the same backbone network - a "representation learning" head and a "clustering" head. The "representation learning" head captures fine-grained patterns of objects at the instance level which serve as clues for the "clustering" head to extract coarse-grain information that separates objects into clusters. The whole model is trained in an end-to-end manner by minimizing the weighted sum of two sample-oriented contrastive losses applied to the outputs of the two heads. To ensure that the contrastive loss corresponding to the "clustering" head is optimal, we introduce a novel critic function called "log-of-dot-product". Extensive experimental results demonstrate that our method significantly outperforms state-of-the-art single-stage clustering methods across a variety of image datasets, improving over the best baseline by about 5-7% in accuracy on CIFAR10/20, STL10, and ImageNet-Dogs. Further, the "two-stage" variant of our method also achieves better results than baselines on three challenging ImageNet subsets.

【46】 FedLab: A Flexible Federated Learning Framework 标题：FedLab：一种灵活的联合学习框架

作者：Dun Zeng,Siqi Liang,Xiangjing Hu,Zenglin Xu 机构：School of Computer Science and Engineering, University of Electronic Science and technology of China, Chengdu, China., School of Computer Science and Technology, Harbin Institute of Technology Shenzhen, Shenzhen, China. 链接：https://arxiv.org/abs/2107.11621 摘要：联合学习（FL）是一种隐私挑战解决方案，它允许多方在不违反隐私保护规定的情况下训练共享模型。近年来，许多优秀的外语教学作品被提出。为了帮助研究者验证他们在外语教学中的想法，我们设计并开发了一个基于PyTorch的灵活的模块化外语教学框架FedLab。本文将介绍FedLab的体系结构和特点。针对当前研究热点：优化和通信压缩，FedLab提供了功能接口，并提供了一系列的基线实现，使得研究人员能够快速实现思想。此外，FedLab在客户端模拟和分布式通信方面都具有可扩展性。摘要：Federated learning (FL) is a solution for privacy challenge, which allows multiparty to train a shared model without violating privacy protection regulations. Many excellent works of FL have been proposed in recent years. To help researchers verify their ideas in FL, we designed and developed FedLab, a flexible and modular FL framework based on PyTorch. In this paper, we will introduce architecture and features of FedLab. For current popular research points: optimization and communication compression, FedLab provides functional interfaces and a series of baseline implementation are available, making researchers quickly implement ideas. In addition, FedLab is scale-able in both client simulation and distributed communication.

【47】 A Model-Agnostic Algorithm for Bayes Error Determination in Binary Classification 标题：一种模型无关的二值分类贝叶斯误差判定算法

作者：Umberto Michelucci,Michela Sperti,Dario Piga,Francesca Venturini,Marco A. Deriu 机构：TOELT llc, Birchlenstr. , D¨ubendorf, Switzerland, PolitoBIOMed Lab, Department of Mechanical and Aerospace Engineering, Politecnico, di Torino, Turin, Italy, Institute of Applied Mathematics and Physics, Zurich University of Applied Sciences 备注：21 pages 链接：https://arxiv.org/abs/2107.11609 摘要：本文提出了一种新的确定最佳性能的方法——内禀极限确定算法（ILD算法），它是根据AUC（ROC曲线下面积）和精确度来测量的，可以从二进制分类问题中的特定数据集获得的数据，这些数据集具有分类特征{\sl，而不管}所使用的模型。这个极限，即Bayes误差，完全独立于所使用的任何模型，并且描述了数据集的内在属性。因此，ILD算法在应用于所考虑的数据集时提供了关于任何二元分类算法的预测极限的重要信息。本文对该算法进行了详细的描述，给出了其完整的数学框架，并给出了便于实现的伪码。最后给出了一个实例。摘要：This paper presents the intrinsic limit determination algorithm (ILD Algorithm), a novel technique to determine the best possible performance, measured in terms of the AUC (area under the ROC curve) and accuracy, that can be obtained from a specific dataset in a binary classification problem with categorical features {\sl regardless} of the model used. This limit, namely the Bayes error, is completely independent of any model used and describes an intrinsic property of the dataset. The ILD algorithm thus provides important information regarding the prediction limits of any binary classification algorithm when applied to the considered dataset. In this paper the algorithm is described in detail, its entire mathematical framework is presented and the pseudocode is given to facilitate its implementation. Finally, an example with a real dataset is given.

【48】 The USYD-JD Speech Translation System for IWSLT 2021 标题：用于IWSLT 2021的USYD-JD语音翻译系统

作者：Liang Ding,Di Wu,Dacheng Tao 机构：The University of Sydney, Peking University 备注：IWSLT 2021 winning system of the low-resource speech translation track 链接：https://arxiv.org/abs/2107.11572 摘要：本文介绍了悉尼大学和JD的联合提交的IWSLT 2021低资源语音翻译任务。我们参加了斯瓦希里语英语指导，并获得了所有参与者中最好的Scarbleu（25.3）分。我们的约束系统是基于流水线框架的，即ASR和NMT。我们用官方提供的ASR和MT数据集训练我们的模型。ASR系统基于开源工具Kaldi，本文主要研究如何充分利用NMT模型。为了减少由ASR模型产生的标点错误，我们利用我们以前的工作SlotRefine来训练标点纠正模型。为了获得更好的翻译效果，我们探讨了最新的有效翻译策略，包括反译、知识提炼、多特征重排序和转换微调。对于模型结构，我们分别尝试了自回归模型和非自回归模型。此外，我们提出了两种新的预训练方法，即.\textit{去噪训练}和.\textit{双向训练}来充分利用数据。大量实验表明，加入上述技术后，BLEU分数得到了持续的提高，最终提交系统的BLEU分数比基线（用原始并行数据训练的Transformer集成模型）提高了约10.8bleu，达到了SOTA的性能。摘要：This paper describes the University of Sydney& JD's joint submission of the IWSLT 2021 low resource speech translation task. We participated in the Swahili-English direction and got the best scareBLEU (25.3) score among all the participants. Our constrained system is based on a pipeline framework, i.e. ASR and NMT. We trained our models with the officially provided ASR and MT datasets. The ASR system is based on the open-sourced tool Kaldi and this work mainly explores how to make the most of the NMT models. To reduce the punctuation errors generated by the ASR model, we employ our previous work SlotRefine to train a punctuation correction model. To achieve better translation performance, we explored the most recent effective strategies, including back translation, knowledge distillation, multi-feature reranking and transductive finetuning. For model structure, we tried auto-regressive and non-autoregressive models, respectively. In addition, we proposed two novel pre-train approaches, i.e. \textit{de-noising training} and \textit{bidirectional training} to fully exploit the data. Extensive experiments show that adding the above techniques consistently improves the BLEU scores, and the final submission system outperforms the baseline (Transformer ensemble model trained with the original parallel data) by approximately 10.8 BLEU score, achieving the SOTA performance.

【49】 Semantic-guided Pixel Sampling for Cloth-Changing Person Re-identification 标题：语义引导的像素采样在换布人再识别中的应用

作者：Xiujun Shu,Ge Li,Xiao Wang,Weijian Ruan,Qi Tian 机构： Ge Li)Xiujun Shu is with Peng Cheng Laboratory and Peking University, Ge Li is with the School of Electronic and Computer Engineering, PekingUniversity 备注：This paper has been published on IEEE Signal Processing Letters 链接：https://arxiv.org/abs/2107.11522 摘要：换衣人再识别（re-ID）是一个新兴的研究课题。这项任务相当具有挑战性，迄今尚未得到充分研究。目前的研究主要集中在体形或轮廓素描方面，但由于视点和姿态的变化，这些研究不够健壮。这项任务的关键是利用与布料无关的线索。本文提出了一种语义引导的像素采样方法，用于换衣人员身份识别任务。我们没有明确定义要提取的特征，而是强制模型自动学习不相关的线索。具体地说，我们首先识别行人的上衣和裤子，然后通过从其他行人身上采样像素来随机改变它们。更改后的样本保留了身份标签，但在不同的行人之间交换了衣服或裤子的像素。此外，我们采用损失函数来约束学习到的特征，以保持变化前后的一致性。这样，模特就被迫学习与上衣和裤子无关的线索。我们在最新发布的PRCC数据集上进行了广泛的实验。我们的方法在Rank1精度上达到65.8%，比以前的方法有很大的提高。代码可在https://github.com/shuxjweb/pixel_sampling.git. 摘要：Cloth-changing person re-identification (re-ID) is a new rising research topic that aims at retrieving pedestrians whose clothes are changed. This task is quite challenging and has not been fully studied to date. Current works mainly focus on body shape or contour sketch, but they are not robust enough due to view and posture variations. The key to this task is to exploit cloth-irrelevant cues. This paper proposes a semantic-guided pixel sampling approach for the cloth-changing person re-ID task. We do not explicitly define which feature to extract but force the model to automatically learn cloth-irrelevant cues. Specifically, we first recognize the pedestrian's upper clothes and pants, then randomly change them by sampling pixels from other pedestrians. The changed samples retain the identity labels but exchange the pixels of clothes or pants among different pedestrians. Besides, we adopt a loss function to constrain the learned features to keep consistent before and after changes. In this way, the model is forced to learn cues that are irrelevant to upper clothes and pants. We conduct extensive experiments on the latest released PRCC dataset. Our method achieved 65.8% on Rank1 accuracy, which outperforms previous methods with a large margin. The code is available at https://github.com/shuxjweb/pixel_sampling.git.

【50】 Caveats for the use of Web of Science Core Collection in old literature retrieval and historical bibliometric analysis 标题：Web of Science核心馆藏在老文献检索和历史文献计量学分析中的使用注意事项

作者：Weishu Liu 机构：org0000-000 1-8780-6709 School of Information Management and Artificial Intelligence, Zhejiang University of Finance and Economics 备注：None 链接：https://arxiv.org/abs/2107.11521 摘要：Fosso Wamba和他的同事们利用科学核心收藏网（WebofScience Core Collection，WoSCC）的出版物发表了一篇有趣而全面的关于技术预测和社会变革的论文，以探讨人工智能（AI）奖学金的结构和动力。Fosso Wamba的研究显示，1991年似乎是人工智能研究的“分水岭”。本文试图从数据库局限性的角度来揭示1991年的这一现象，通过对WoSCC的摘要/作者关键词/关键词+领域的实证研究来揭示这一现象。本文所发现的WoSCC中摘要/作者关键词/关键词加信息的低可用率在很大程度上可以解释1991年人工智能学术界的“分水岭”现象。讨论部分还提到了在旧文献检索和历史文献计量分析中使用WoSCC的一些其他注意事项。这本研究笔记补充了福索·万巴和他的同事的研究，也有助于避免在旧文献检索和历史文献计量分析中使用WoSCC的不当解释。摘要：By using publications from Web of Science Core Collection (WoSCC), Fosso Wamba and his colleagues published an interesting and comprehensive paper in Technological Forecasting and Social Change to explore the structure and dynamics of artificial intelligence (AI) scholarship. Data demonstrated in Fosso Wamba's study implied that the year 1991 seemed to be a "watershed" of AI research. This research note tried to uncover the 1991 phenomenon from the perspective of database limitation by probing the limitations of search in abstract/author keywords/keywords plus fields of WoSCC empirically. The low availability rates of abstract/author keywords/keywords plus information in WoSCC found in this study can explain the "watershed" phenomenon of AI scholarship in 1991 to a large extent. Some other caveats for the use of WoSCC in old literature retrieval and historical bibliometric analysis were also mentioned in the discussion section. This research note complements Fosso Wamba and his colleagues' study and also helps avoid improper interpretation in the use of WoSCC in old literature retrieval and historical bibliometric analysis.

【51】 Crosslink-Net: Double-branch Encoder Segmentation Network via Fusing Vertical and Horizontal Convolutions 标题：Crosslink-Net：基于垂直和水平卷积融合的双分支编码器分段网络

作者：Qian Yu,Lei Qi,Luping Zhou,Lei Wang,Yilong Yin,Yinghuan Shi,Wuzhang Wang,Yang Gao 备注：13 pages, 12 figures 链接：https://arxiv.org/abs/2107.11517 摘要：准确的图像分割在医学图像分析中起着至关重要的作用，但它面临着形状多样、大小不一、边界模糊等诸多挑战。为了解决这些问题，人们提出了基于平方核的编解码结构，并得到了广泛的应用，但其性能仍然不尽如人意。为了进一步应对这些挑战，我们提出了一种新的双分支编码器结构。我们的架构受到两个观察的启发：1）由于通过平方卷积核学习的特征识别需要进一步改进，我们建议在双分支编码器中使用非平方垂直和水平卷积核，因此，这两个分支所学习的特征可以相互补充。2）考虑到空间注意有助于模型更好地聚焦于大尺寸图像中的目标区域，我们提出了一种注意丢失的方法来进一步强调小尺寸目标的分割。结合上述两种方案，提出了一种新的用于医学图像分割的双分支编码器分割框架Crosslink-Net。在四个数据集上的实验验证了该模型的有效性。代码发布于https://github.com/Qianyu1226/Crosslink-Net. 摘要：Accurate image segmentation plays a crucial role in medical image analysis, yet it faces great challenges of various shapes, diverse sizes, and blurry boundaries. To address these difficulties, square kernel-based encoder-decoder architecture has been proposed and widely used, but its performance remains still unsatisfactory. To further cope with these challenges, we present a novel double-branch encoder architecture. Our architecture is inspired by two observations: 1) Since the discrimination of features learned via square convolutional kernels needs to be further improved, we propose to utilize non-square vertical and horizontal convolutional kernels in the double-branch encoder, so features learned by the two branches can be expected to complement each other. 2) Considering that spatial attention can help models to better focus on the target region in a large-sized image, we develop an attention loss to further emphasize the segmentation on small-sized targets. Together, the above two schemes give rise to a novel double-branch encoder segmentation framework for medical image segmentation, namely Crosslink-Net. The experiments validate the effectiveness of our model on four datasets. The code is released at https://github.com/Qianyu1226/Crosslink-Net.

【52】 Multi-Perspective Content Delivery Networks Security Framework Using Optimized Unsupervised Anomaly Detection 标题：基于优化无监督异常检测的多视角内容分发网络安全框架

作者：Li Yang,Abdallah Moubayed,Abdallah Shami,Parisa Heidari,Amine Boukhtouta,Adel Larabi,Richard Brunner,Stere Preda,Daniel Migault 机构：Western University 备注：Accepted and to Appear in IEEE Transactions on Network and Service Management 链接：https://arxiv.org/abs/2107.11514 摘要：内容交付网络（CDN）通过Internet提供高效的内容分发。cdn提高了全球通信的连接性和效率，但其缓存机制可能被网络攻击者破坏。在安全机制中，有效的异常检测是CDN安全增强的重要组成部分。在这项工作中，我们提出了一个多视角的无监督学习框架，用于CDNs中的异常检测。在该框架中，提出了一种多视角特征工程方法、一种基于隔离林和高斯混合模型的优化无监督异常检测模型以及一种多视角验证方法，主要从客户端互联网协议（IP）和节点的角度检测CDN中的异常行为，从而识别拒绝服务（DoS）和缓存污染攻击（CPA）模式。实验结果是基于一个主要CDN运营商提供的8天的真实CDN日志数据的分析。通过实验，该框架有效地识别了异常内容、受损节点、恶意ip及其相应的攻击类型，并得到了多个网络安全专家的验证。这表明了该方法在实际CDN数据处理中的有效性。摘要：Content delivery networks (CDNs) provide efficient content distribution over the Internet. CDNs improve the connectivity and efficiency of global communications, but their caching mechanisms may be breached by cyber-attackers. Among the security mechanisms, effective anomaly detection forms an important part of CDN security enhancement. In this work, we propose a multi-perspective unsupervised learning framework for anomaly detection in CDNs. In the proposed framework, a multi-perspective feature engineering approach, an optimized unsupervised anomaly detection model that utilizes an isolation forest and a Gaussian mixture model, and a multi-perspective validation method, are developed to detect abnormal behaviors in CDNs mainly from the client Internet Protocol (IP) and node perspectives, therefore to identify the denial of service (DoS) and cache pollution attack (CPA) patterns. Experimental results are presented based on the analytics of eight days of real-world CDN log data provided by a major CDN operator. Through experiments, the abnormal contents, compromised nodes, malicious IPs, as well as their corresponding attack types, are identified effectively by the proposed framework and validated by multiple cybersecurity experts. This shows the effectiveness of the proposed method when applied to real-world CDN data.

【53】 Cycled Compositional Learning between Images and Text 标题：图文循环构图学习

作者：Jongseok Kim,Youngjae Yu,Seunghwan Lee,GunheeKim 机构：Seoul National University, RippleAI, Seoul, Korea 备注：Fashion IQ 2020 challenge winner. Workshop tech report 链接：https://arxiv.org/abs/2107.11509 摘要：提出了一种基于循环合成网络的图像文本嵌入合成语义距离度量方法。首先，合成网络在嵌入空间中使用相对字幕将参考图像传输到目标图像。其次，校正网络计算嵌入空间中参考图像和检索到的目标图像之间的差异，并将其与相对标题匹配。我们的目标是学习作文网络的作文映射。由于这种单向映射是高度欠约束的，我们将其与逆关系学习和校正网络相结合，并为给定图像引入循环关系。我们参加了Fashion IQ 2020挑战赛，并通过我们的模型集成获得了第一名。摘要：We present an approach named the Cycled Composition Network that can measure the semantic distance of the composition of image-text embedding. First, the Composition Network transit a reference image to target image in an embedding space using relative caption. Second, the Correction Network calculates a difference between reference and retrieved target images in the embedding space and match it with a relative caption. Our goal is to learn a Composition mapping with the Composition Network. Since this one-way mapping is highly under-constrained, we couple it with an inverse relation learning with the Correction Network and introduce a cycled relation for given Image We participate in Fashion IQ 2020 challenge and have won the first place with the ensemble of our model.

【54】 μDARTS: Model Uncertainty-Aware Differentiable Architecture Search标题：μDARTS：模型不确定性感知可区分体系结构搜索

作者：Biswadeep Chakraborty,Saibal Mukhopadhyay 机构：GeorgiaInstituteofTechnology 备注：10 pages, 7 Tables, 6 Figures, Submitted in TNNLS 链接：https://arxiv.org/abs/2107.11500 摘要：我们提出了一个模型不确定性感知可微结构搜索（$\mu$DARTS），它优化了神经网络，同时实现了高精度和低不确定性。我们在DARTS单元中引入了具体的退出，并在训练损失中加入了montecarlo正则化器来优化具体的退出概率。在验证损失中引入了预测方差项，使得搜索模型不确定性最小的体系结构成为可能。在CIFAR10、CIFAR100、SVHN和ImageNet上的实验验证了$\mu$省道与现有省道方法相比在提高精度和减少不确定性方面的有效性。此外，从$\mu$DARTS得到的最终结构与从现有DARTS方法得到的结构相比，对输入图像和模型参数处的噪声具有更高的鲁棒性。摘要：We present a Model Uncertainty-aware Differentiable ARchiTecture Search ($\mu$DARTS) that optimizes neural networks to simultaneously achieve high accuracy and low uncertainty. We introduce concrete dropout within DARTS cells and include a Monte-Carlo regularizer within the training loss to optimize the concrete dropout probabilities. A predictive variance term is introduced in the validation loss to enable searching for architecture with minimal model uncertainty. The experiments on CIFAR10, CIFAR100, SVHN, and ImageNet verify the effectiveness of $\mu$DARTS in improving accuracy and reducing uncertainty compared to existing DARTS methods. Moreover, the final architecture obtained from $\mu$DARTS shows higher robustness to noise at the input image and model parameters compared to the architecture obtained from existing DARTS methods.

【55】 Similarity Based Label Smoothing For Dialogue Generation 标题：基于相似度的标签平滑在对话生成中的应用

作者：Sougata Saha,Souvik Das,Rohini Srihari 机构：Department of Computer Science and Engineering, University at Buffalo, New York 链接：https://arxiv.org/abs/2107.11481 摘要：生成型神经会话系统的训练一般以训练硬目标和预测逻辑之间的熵损失最小为目标。通常，可以通过使用正则化技术（如标签平滑）来获得性能增益和改进的泛化，这种正则化技术将训练的“硬”目标转化为“软”目标。然而，标签平滑在不正确的训练目标上强制了一个与数据无关的均匀分布，这导致了对每个正确目标等概率不正确目标的错误假设。本文提出并实验了一种基于数据相关词相似度的加权方法，将标签平滑中错误目标概率的均匀分布转化为基于语义的更自然的分布。我们引入超参数来控制不正确的目标分布，并在两个标准的开放域对话语料库上报告了使用基于损失的标准标签平滑训练的网络的显著性能改进。摘要：Generative neural conversational systems are generally trained with the objective of minimizing the entropy loss between the training "hard" targets and the predicted logits. Often, performance gains and improved generalization can be achieved by using regularization techniques like label smoothing, which converts the training "hard" targets to "soft" targets. However, label smoothing enforces a data independent uniform distribution on the incorrect training targets, which leads to an incorrect assumption of equi-probable incorrect targets for each correct target. In this paper we propose and experiment with incorporating data dependent word similarity based weighing methods to transforms the uniform distribution of the incorrect target probabilities in label smoothing, to a more natural distribution based on semantics. We introduce hyperparameters to control the incorrect target distribution, and report significant performance gains over networks trained using standard label smoothing based loss, on two standard open domain dialogue corpora.

【56】 Free Hyperbolic Neural Networks with Limited Radii 标题：有限半径自由双曲神经网络

作者：Yunhui Guo,Xudong Wang,Yubei Chen,Stella X. Yu 机构：UC Berkeley ICSI †, Facebook AI Research ‡ 备注：17 pages 链接：https://arxiv.org/abs/2107.11472 摘要：具有常负曲率的非欧几里德几何，即双曲空间，在机器学习领域引起了广泛的关注。双曲空间由于其连续嵌入层次结构的能力和低失真，已经被应用于树型结构的数据学习。直接在双曲空间中工作的双曲型神经网络（HNNs）最近也被提出，以进一步挖掘双曲表示的潜力。尽管HNNs在隐式层次结构的数据集上取得了比欧几里德神经网络（ENNs）更好的性能，但在CIFAR和ImageNet等标准分类基准上的性能仍然很差。传统的观点是，在应用HNNs时，数据尊重双曲几何是至关重要的。在本文中，我们首先进行了一项实证研究表明，HNNs在标准识别数据集上的较差性能可归因于臭名昭著的消失梯度问题。我们进一步发现，这个问题源于HNNs的混合体系结构。我们的分析导致了一个简单而有效的解决方案称为特征剪辑，它正则化双曲嵌入时，其范数超过给定的阈值。实验结果表明，该方法能有效地避免反向传播训练hnn时的消失梯度问题。改进后的HNNs能够在MNIST、CIFAR10、CIFAR100和ImageNet等标准图像识别数据集上实现与ENNs相当的性能，同时表现出更强的对抗鲁棒性和更强的分布外检测能力。摘要：Non-Euclidean geometry with constant negative curvature, i.e., hyperbolic space, has attracted sustained attention in the community of machine learning. Hyperbolic space, owing to its ability to embed hierarchical structures continuously with low distortion, has been applied for learning data with tree-like structures. Hyperbolic Neural Networks (HNNs) that operate directly in hyperbolic space have also been proposed recently to further exploit the potential of hyperbolic representations. While HNNs have achieved better performance than Euclidean neural networks (ENNs) on datasets with implicit hierarchical structure, they still perform poorly on standard classification benchmarks such as CIFAR and ImageNet. The traditional wisdom is that it is critical for the data to respect the hyperbolic geometry when applying HNNs. In this paper, we first conduct an empirical study showing that the inferior performance of HNNs on standard recognition datasets can be attributed to the notorious vanishing gradient problem. We further discovered that this problem stems from the hybrid architecture of HNNs. Our analysis leads to a simple yet effective solution called Feature Clipping, which regularizes the hyperbolic embedding whenever its norm exceeding a given threshold. Our thorough experiments show that the proposed method can successfully avoid the vanishing gradient problem when training HNNs with backpropagation. The improved HNNs are able to achieve comparable performance with ENNs on standard image recognition datasets including MNIST, CIFAR10, CIFAR100 and ImageNet, while demonstrating more adversarial robustness and stronger out-of-distribution detection capability.

【57】 Deep Learning Based Cardiac MRI Segmentation: Do We Need Experts? 标题：基于深度学习的心脏MRI分割：我们需要专家吗？

作者：Youssef Skandarani,Pierre-Marc Jodoin,Alain Lalande 机构：��, Citation: Skandarani, Y.; Jodoin, P-M.;, Lalande, A. Deep Learning based, Cardiac MRI segmentation: Do we, need experts?. Preprints ,. 备注：None 链接：https://arxiv.org/abs/2107.11447 摘要：深度学习方法是解决医学图像分析问题的有效方法。心脏MRI分割就是这样一种应用，它和其他许多应用一样，需要大量的注释数据，因此训练好的网络可以很好地推广。不幸的是，由医学专家手工制作大量图像的过程既慢又贵。在这篇论文中，我们开始探讨专家知识是否是创建机器学习可以成功训练的带注释数据集的严格要求。为此，我们评估了三种分割模型，即U-Net、注意力U-Net和ENet的性能，这些模型在专家和非专家背景下分别训练了不同的损失函数，用于心脏电影MRI分割。评估采用经典分割指标（Dice指数和Hausdorff距离）以及临床测量，如心室射血分数和心肌质量。结果表明，在非专家背景数据上训练的分割神经网络的泛化性能与在专家背景数据上的泛化性能相当，特别是在非专家得到相当程度的训练时，强调了为心脏数据集高效廉价地创建注释的机会。摘要：Deep learning methods are the de-facto solutions to a multitude of medical image analysis tasks. Cardiac MRI segmentation is one such application which, like many others, requires a large number of annotated data so a trained network can generalize well. Unfortunately, the process of having a large number of manually curated images by medical experts is both slow and utterly expensive. In this paper, we set out to explore whether expert knowledge is a strict requirement for the creation of annotated datasets that machine learning can successfully train on. To do so, we gauged the performance of three segmentation models, namely U-Net, Attention U-Net, and ENet, trained with different loss functions on expert and non-expert groundtruth for cardiac cine-MRI segmentation. Evaluation was done with classic segmentation metrics (Dice index and Hausdorff distance) as well as clinical measurements, such as the ventricular ejection fractions and the myocardial mass. Results reveal that generalization performances of a segmentation neural network trained on non-expert groundtruth data is, to all practical purposes, as good as on expert groundtruth data, in particular when the non-expert gets a decent level of training, highlighting an opportunity for the efficient and cheap creation of annotations for cardiac datasets.

【58】 Cooperative Exploration for Multi-Agent Deep Reinforcement Learning 标题：多Agent深度强化学习的协作探索

作者：Iou-Jen Liu,Unnat Jain,Raymond A. Yeh,Alexander G. Schwing 机构： multi-agent deep deterministic policy gradient 1University of Illinois at Urbana-Champaign 备注：ICML 2021; Project Page: this https URL 链接：https://arxiv.org/abs/2107.11444 摘要：探索对于深层强化学习的良好效果至关重要，并引起了广泛的关注。然而，现有的多智能体深度强化学习算法大多采用基于噪声的技术。最近，开发方法，考虑多个代理之间的合作已经开发出来。然而，现有的方法面临着一个共同的挑战：代理很难识别出值得探索的状态，并且很难协调对这些状态的探索工作。针对这一不足，本文提出了协作多agent探索（CMAE）：agent在探索过程中共享一个共同的目标。通过基于归一化熵的方法从多个投影状态空间中选择目标。然后，训练代理以协调的方式达到这个目标。我们证明了CMAE在各种任务上的表现总是优于基线，包括稀疏奖励版本的多粒子环境（MPE）和星际争霸多智能体挑战（SMAC）。摘要：Exploration is critical for good results in deep reinforcement learning and has attracted much attention. However, existing multi-agent deep reinforcement learning algorithms still use mostly noise-based techniques. Very recently, exploration methods that consider cooperation among multiple agents have been developed. However, existing methods suffer from a common challenge: agents struggle to identify states that are worth exploring, and hardly coordinate exploration efforts toward those states. To address this shortcoming, in this paper, we propose cooperative multi-agent exploration (CMAE): agents share a common goal while exploring. The goal is selected from multiple projected state spaces via a normalized entropy-based technique. Then, agents are trained to reach this goal in a coordinated manner. We demonstrate that CMAE consistently outperforms baselines on various tasks, including a sparse-reward version of the multiple-particle environment (MPE) and the Starcraft multi-agent challenge (SMAC).

【59】 Compressing Neural Networks: Towards Determining the Optimal Layer-wise Decomposition 标题：压缩神经网络：朝着确定最佳分层分解的方向发展

作者：Lucas Liebenwein,Alaa Maalouf,Oren Gal,Dan Feldman,Daniela Rus 机构：MIT CSAIL, University of Haifa 链接：https://arxiv.org/abs/2107.11442 摘要：我们提出了一种新的深度神经网络全局压缩框架，该框架自动分析每一层，以确定最佳的每一层压缩比，同时实现所需的整体压缩。我们的算法依赖于压缩每个卷积（或完全连接）层的思想，通过将其信道分为多个组，并通过低秩分解对每个组进行分解。该算法的核心是从Eckart-Young-Mirsky定理推导出分层误差界。然后，我们利用这些边界将压缩问题框架化为一个优化问题，在这个优化问题中，我们希望最小化跨层的最大压缩错误，并提出一个有效的算法来解决这个问题。我们的实验表明，我们的方法优于现有的低秩压缩方法在广泛的网络和数据集。我们相信，我们的研究结果为将来研究现代神经网络的全局性能-规模权衡开辟了新的途径。我们的代码在https://github.com/lucaslie/torchprune. 摘要：We present a novel global compression framework for deep neural networks that automatically analyzes each layer to identify the optimal per-layer compression ratio, while simultaneously achieving the desired overall compression. Our algorithm hinges on the idea of compressing each convolutional (or fully-connected) layer by slicing its channels into multiple groups and decomposing each group via low-rank decomposition. At the core of our algorithm is the derivation of layer-wise error bounds from the Eckart Young Mirsky theorem. We then leverage these bounds to frame the compression problem as an optimization problem where we wish to minimize the maximum compression error across layers and propose an efficient algorithm towards a solution. Our experiments indicate that our method outperforms existing low-rank compression approaches across a wide range of networks and data sets. We believe that our results open up new avenues for future research into the global performance-size trade-offs of modern neural networks. Our code is available at https://github.com/lucaslie/torchprune.

【60】 HierMUD: Hierarchical Multi-task Unsupervised Domain Adaptation between Bridges for Drive-by Damage Diagnosis 标题：HierMUD：桥间分层多任务无监督领域自适应驾车损伤诊断

作者：Jingxiao Liu,Susu Xu,Mario Bergés,Hae Young Noh 机构：have been developed to achieve continuous and autonomous 1Stanford University, USA 2Stony Brook University, USA 3Carnegie Mellon University, Department of Civil & Environmental Engineering, StanfordUniversity 链接：https://arxiv.org/abs/2107.11435 摘要：利用驾驶车辆的振动监测桥梁的健康状况有很多好处，例如不需要在桥梁上直接安装和维护传感器。然而，许多现有的驱动监控方法都是基于有监督学习模型的，需要从每个感兴趣的桥中获取标记数据，这是昂贵和耗时的，如果不是不可能的话。为此，我们引入了一个新的框架，将从一座桥上学习到的模型转换到另一座桥上的损伤诊断，而不需要从目标桥上获取任何标签。我们的框架以对抗的方式训练一个层次化的神经网络模型，以提取任务共享和任务特定的特征，这些特征对多个诊断任务是有用的，并且跨多个桥是不变的。我们在2座桥和3辆车的实验数据上评估了我们的框架。我们实现了95%的损伤检测准确率，93%的定位，高达72%的量化，这是约2倍的改善基线方法。摘要：Monitoring bridge health using vibrations of drive-by vehicles has various benefits, such as no need for directly installing and maintaining sensors on the bridge. However, many of the existing drive-by monitoring approaches are based on supervised learning models that require labeled data from every bridge of interest, which is expensive and time-consuming, if not impossible, to obtain. To this end, we introduce a new framework that transfers the model learned from one bridge to diagnose damage in another bridge without any labels from the target bridge. Our framework trains a hierarchical neural network model in an adversarial way to extract task-shared and task-specific features that are informative to multiple diagnostic tasks and invariant across multiple bridges. We evaluate our framework on experimental data collected from 2 bridges and 3 vehicles. We achieve accuracies of 95% for damage detection, 93% for localization, and up to 72% for quantification, which are ~2 times improvements from baseline methods.

【61】 Robust Explainability: A Tutorial on Gradient-Based Attribution Methods for Deep Neural Networks 标题：稳健可解释性：深度神经网络基于梯度的属性方法教程

作者：Ian E. Nielsen,Ghulam Rasool,Dimah Dera,Nidhal Bouaynaya,Ravi P. Ramachandran 机构： Rowan University, The University of Texas Rio Grande Valley 备注：21 pages, 3 figures 链接：https://arxiv.org/abs/2107.11400 摘要：随着深层神经网络的兴起，人们越来越认识到解释这些网络预测的挑战。虽然有许多方法可以解释深层神经网络的决策，但目前对于如何评价它们还没有共识。另一方面，稳健性是深度学习研究的热门话题；然而，直到最近才有人在可解释性方面谈论它。在本教程中，我们首先介绍基于梯度的可解释性方法。这些技术使用梯度信号来分配输入特征的决策负担。之后，我们将讨论如何评估基于梯度的方法的稳健性，以及对抗性稳健性在有意义的解释中所起的作用。我们还讨论了基于梯度的方法的局限性。最后，我们给出了在选择可解释性方法之前应该检查的最佳实践和属性。最后，我们在稳健性和可解释性的收敛性方面提出了该领域未来的研究方向。摘要：With the rise of deep neural networks, the challenge of explaining the predictions of these networks has become increasingly recognized. While many methods for explaining the decisions of deep neural networks exist, there is currently no consensus on how to evaluate them. On the other hand, robustness is a popular topic for deep learning research; however, it is hardly talked about in explainability until very recently. In this tutorial paper, we start by presenting gradient-based interpretability methods. These techniques use gradient signals to assign the burden of the decision on the input features. Later, we discuss how gradient-based methods can be evaluated for their robustness and the role that adversarial robustness plays in having meaningful explanations. We also discuss the limitations of gradient-based methods. Finally, we present the best practices and attributes that should be examined before choosing an explainability method. We conclude with the future directions for research in the area at the convergence of robustness and explainability.

【62】 Belief Propagation as Diffusion 标题：信仰传播是一种传播

作者：Olivier Peltre 机构：Université d’Artois, Faculté Jean Perrin (LML), Rue Jean Souvraz , LENS CEDEX 备注：None 链接：https://arxiv.org/abs/2107.12230 摘要：我们引入新的信念传播算法来估计高维概率分布的边缘。它们涉及与统计系统的局部描述相关的自然（共）同调结构。摘要：We introduce novel belief propagation algorithms to estimate the marginals of a high dimensional probability distribution. They involve natural (co)homological constructions relevant for a localised description of statistical systems.

【63】 Adaptation of Tacotron2-based Text-To-Speech for Articulatory-to-Acoustic Mapping using Ultrasound Tongue Imaging 标题：基于Taco tron2的文本到语音的适配，用于超声舌象关节声学映射

作者：Csaba Zainkó,László Tóth,Amin Honarmandi Shandiz,Gábor Gosztolya,Alexandra Markó,Géza Németh,Tamás Gábor Csapó 机构：G´eza N´emeth, Tam´as G´abor Csap´o, Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary, Institute of Informatics, University of Szeged, Hungary 备注：accepted at SSW11. arXiv admin note: text overlap with arXiv:2008.03152 链接：https://arxiv.org/abs/2107.12051 摘要：对于发音到声学的映射，通常只有有限的并行训练数据可用，因此不可能应用完全端到端的解决方案，如Tacotron2。在这篇论文中，我们实验了转移学习和Tacotron2文本语音模型的自适应，以提高基于超声的发音-声学映射的最终合成质量。我们使用多人预先训练的tacotron2tts模型和预先训练的WaveGlow神经声码器。发音到声学的转换包括三个步骤：1）从一系列超声舌像记录，一个三维卷积神经网络预测预先训练的Tacotron2模型的输入，2）Tacotron2模型将这个中间表示转换成80维mel谱图，最后采用WaveGlow模型进行推理。生成的语音包含来自超声记录的原始发音数据的时间，但是F0轮廓和频谱信息是由Tacotron2模型预测的。F0值与原始超声图像无关，但代表目标说话人，因为它们是从预先训练的Tacotron2模型推断出来的。在我们的实验中，我们证明了合成语音质量是更自然的解决方案比我们的早期模型。摘要：For articulatory-to-acoustic mapping, typically only limited parallel training data is available, making it impossible to apply fully end-to-end solutions like Tacotron2. In this paper, we experimented with transfer learning and adaptation of a Tacotron2 text-to-speech model to improve the final synthesis quality of ultrasound-based articulatory-to-acoustic mapping with a limited database. We use a multi-speaker pre-trained Tacotron2 TTS model and a pre-trained WaveGlow neural vocoder. The articulatory-to-acoustic conversion contains three steps: 1) from a sequence of ultrasound tongue image recordings, a 3D convolutional neural network predicts the inputs of the pre-trained Tacotron2 model, 2) the Tacotron2 model converts this intermediate representation to an 80-dimensional mel-spectrogram, and 3) the WaveGlow model is applied for final inference. This generated speech contains the timing of the original articulatory data from the ultrasound recording, but the F0 contour and the spectral information is predicted by the Tacotron2 model. The F0 values are independent of the original ultrasound images, but represent the target speaker, as they are inferred from the pre-trained Tacotron2 model. In our experiments, we demonstrated that the synthesized speech quality is more natural with the proposed solutions than with our earlier model.

【64】 A Survey of Monte Carlo Methods for Parameter Estimation 标题：参数估计的蒙特卡罗方法综述

作者：D. Luengo,L. Martino,M. Bugallo,V. Elvira,S. Särkkä 机构：for parameter estimation”. EURASIP Journal on Advances in Signal Processing, Article, number: , (,)., A Survey of Monte Carlo Methods, for Parameter Estimation, Universidad Polit´ecnica de Madrid (UPM), Spain., Universidad Rey Juan Carlos (URJC), Spain. 备注：None 链接：https://arxiv.org/abs/2107.11820 摘要：统计信号处理应用通常需要估计给定一组观测数据的一些相关参数。这些估计通常是通过求解多变量优化问题获得的，如在最大似然（ML）或最大后验概率（MAP）估计中，或者通过执行多维积分获得的，如在最小均方误差（MMSE）估计中。不幸的是，在大多数实际应用中找不到这些估计量的解析表达式，蒙特卡罗方法是一种可行的方法。MC方法通过从期望分布或更简单的分布中抽取随机样本来计算一致估计量。最重要的MC算法家族是Markov链MC（MCMC）和重要性抽样（IS）。一方面，MCMC方法从一个建议密度中抽取样本，然后通过接受或拒绝这些候选样本作为链的新状态，建立一个遍历马尔可夫链，其平稳分布就是期望分布。另一方面，IS技术从一个简单的建议密度中抽取样本，然后给它们分配适当的权重，以某种适当的方式度量它们的质量。本文对信号处理中静态参数估计的MC方法进行了全面的综述。本文还提供了MC方案发展的历史记录，接着是基本MC方法和拒绝采样（RS）算法的简要描述，以及描述许多最相关的MCMC和is算法及其组合使用的三个部分。摘要：Statistical signal processing applications usually require the estimation of some parameters of interest given a set of observed data. These estimates are typically obtained either by solving a multi-variate optimization problem, as in the maximum likelihood (ML) or maximum a posteriori (MAP) estimators, or by performing a multi-dimensional integration, as in the minimum mean squared error (MMSE) estimators. Unfortunately, analytical expressions for these estimators cannot be found in most real-world applications, and the Monte Carlo (MC) methodology is one feasible approach. MC methods proceed by drawing random samples, either from the desired distribution or from a simpler one, and using them to compute consistent estimators. The most important families of MC algorithms are Markov chain MC (MCMC) and importance sampling (IS). On the one hand, MCMC methods draw samples from a proposal density, building then an ergodic Markov chain whose stationary distribution is the desired distribution by accepting or rejecting those candidate samples as the new state of the chain. On the other hand, IS techniques draw samples from a simple proposal density, and then assign them suitable weights that measure their quality in some appropriate way. In this paper, we perform a thorough review of MC methods for the estimation of static parameters in signal processing applications. A historical note on the development of MC schemes is also provided, followed by the basic MC method and a brief description of the rejection sampling (RS) algorithm, as well as three sections describing many of the most relevant MCMC and IS algorithms, and their combined use.

【65】 Sensitivity and robustness analysis in Bayesian networks with the bnmonitor R package 标题：基于bnmonitor R包的贝叶斯网络灵敏度和鲁棒性分析

作者：Manuele Leonelli,Ramsiya Ramanathan,Rachel L. Wilkerson 链接：https://arxiv.org/abs/2107.11785 摘要：贝叶斯网络是一类广泛应用于复杂操作系统风险评估的模型。现在有多种方法，以及实现的软件，通过数据学习或专家启发来指导它们的构建。然而，构建的贝叶斯网络在用于实际风险评估之前需要经过验证。这里，我们举例说明bnmonitor包的用法：第一个用于验证贝叶斯网络的综合软件。利用bnmonitor对一个医学数据集进行了应用数据分析，说明了bnmonitor的多种功能。摘要：Bayesian networks are a class of models that are widely used for risk assessment of complex operational systems. There are now multiple approaches, as well as implemented software, that guide their construction via data learning or expert elicitation. However, a constructed Bayesian network needs to be validated before it can be used for practical risk assessment. Here, we illustrate the usage of the bnmonitor R package: the first comprehensive software for the validation of a Bayesian network. An applied data analysis using bnmonitor is carried out over a medical dataset to illustrate the use of its wide array of functions.

【66】 Efficient QUBO transformation for Higher Degree Pseudo Boolean Functions 标题：高次伪布尔函数的有效Qubo变换

作者：Amit Verma,Mark Lewis,Gary Kochenberger 机构：Kochenberger, Received: date Accepted: date 备注：Preprint submitted to Springer 链接：https://arxiv.org/abs/2107.11695 摘要：二次无约束二元优化（QUBO）被公认为一个统一的框架，用于建模广泛的问题。可以使用为求解QUBO而定制的商业解算器来解决问题，并且由于QUBO具有二级，因此有一种将更高阶伪布尔问题转换为QUBO格式的方法是很有用的。标准转换方法需要额外的辅助变量，这些辅助变量由每个高次项的惩罚项支持。本文在现有的三次到二次变换方法的基础上，通过最小化附加变量的个数和惩罚系数，对其进行了改进。在模拟为QUBO的Max-3-SAT上进行的大量实验测试表明，用于最小化辅助变量数量的子问题大小减少了近100%。摘要：Quadratic Unconstrained Binary Optimization (QUBO) is recognized as a unifying framework for modeling a wide range of problems. Problems can be solved with commercial solvers customized for solving QUBO and since QUBO have degree two, it is useful to have a method for transforming higher degree pseudo-Boolean problems to QUBO format. The standard transformation approach requires additional auxiliary variables supported by penalty terms for each higher degree term. This paper improves on the existing cubic-to-quadratic transformation approach by minimizing the number of additional variables as well as penalty coefficient. Extensive experimental testing on Max 3-SAT modeled as QUBO shows a near 100% reduction in the subproblem size used for minimization of the number of auxiliary variables.

【67】 Automatic tempered posterior distributions for Bayesian inversion problems 标题：贝叶斯反演问题的自动调和后验分布

作者：L. Martino,F. Llorente,E. Curbelo,J. Lopez-Santiago,J. Miguez 机构：† Universidad rey Juan Carlos (URJC), Madrid, Spain., ∗ Universidad Carlos III de Madrd (UC,M), Madrid, Spain. 备注：None 链接：https://arxiv.org/abs/2107.11614 摘要：针对贝叶斯反演问题，提出了一种新的自适应重要性抽样方案，该方案将感兴趣的变量的推断和数据噪声的功率分开。更具体地，我们考虑贝叶斯分析的兴趣变量（即，模型的参数反转），而我们采用最大似然估计噪声功率的方法。整个技术通过迭代过程、交替采样和优化步骤来实现。此外，噪声功率也被用作感兴趣的变量的后验分布的缓和参数。因此，生成回火后验密度序列，其中回火参数根据噪声功率的实际估计自动选择。还可以对模型参数和尺度参数进行完整的贝叶斯研究。数值实验表明了该方法的有效性。摘要：We propose a novel adaptive importance sampling scheme for Bayesian inversion problems where the inference of the variables of interest and the power of the data noise is split. More specifically, we consider a Bayesian analysis for the variables of interest (i.e., the parameters of the model to invert), whereas we employ a maximum likelihood approach for the estimation of the noise power. The whole technique is implemented by means of an iterative procedure, alternating sampling and optimization steps. Moreover, the noise power is also used as a tempered parameter for the posterior distribution of the the variables of interest. Therefore, a sequence of tempered posterior densities is generated, where the tempered parameter is automatically selected according to the actual estimation of the noise power. A complete Bayesian study over the model parameters and the scale parameter can be also performed. Numerical experiments show the benefits of the proposed approach.

【68】 Plinko: A Theory-Free Behavioral Measure of Priors for Statistical Learning and Mental Model Updating 标题：Plinko：一种用于统计学习和心理模型更新的无理论先验行为测量

作者：Peter A. V. DiBerardino,Alexandre L. S. Filipowicz,James Danckert,Britt Anderson 机构：Department of Psychology, University of Waterloo, Waterloo, ON N,L ,G 链接：https://arxiv.org/abs/2107.11477 摘要：概率分布是贝叶斯认知理论的核心，但行为评估并不能直接测量它们。后验分布通常是从个体参与者行为的集合中计算出来的，但也被用来得出关于参与者信念内部结构的结论。同样没有明确测量的是先验分布，它通过表示信念的初始状态将贝叶斯模型与其他模型区分开来。相反，先验知识通常来自实验者的直觉或模型假设，并平等地应用于所有参与者。在这里，我们使用“Plinko”进行了三个实验，这是一项行为任务，参与者在所有可用结果中估计球滴的分布，并在任何观察之前明确测量分布。在实验1中，我们发现参与者的先验知识聚集在典型的概率分布（高斯分布、双峰分布等）周围，而先验的聚类成员可能表示学习能力。在实验2中，我们强调了参与者对所呈现分布的未知变化进行更新的能力，以及这种能力如何受到环境操纵的影响。最后，在实验3中，我们验证了个体参与者的先验知识是可靠的表征，并且当面对一个物理上不可信的、根据个体参与者的输入动态定义的落球分布时，学习不会受到阻碍。这项任务将证明有助于更密切地研究统计学习和心理模型更新的机制，而不需要更多传统计算建模方法所作的假设。摘要：Probability distributions are central to Bayesian accounts of cognition, but behavioral assessments do not directly measure them. Posterior distributions are typically computed from collections of individual participant actions, yet are used to draw conclusions about the internal structure of participant beliefs. Also not explicitly measured are the prior distributions that distinguish Bayesian models from others by representing initial states of belief. Instead, priors are usually derived from experimenters' intuitions or model assumptions and applied equally to all participants. Here we present three experiments using "Plinko", a behavioral task in which participants estimate distributions of ball drops over all available outcomes and where distributions are explicitly measured before any observations. In Experiment 1, we show that participant priors cluster around prototypical probability distributions (Gaussian, bimodal, etc.), and that prior cluster membership may indicate learning ability. In Experiment 2, we highlight participants' ability to update to unannounced changes of presented distributions and how this ability is affected by environmental manipulation. Finally, in Experiment 3, we verify that individual participant priors are reliable representations and that learning is not impeded when faced with a physically implausible ball drop distribution that is dynamically defined according to individual participant input. This task will prove useful in more closely examining mechanisms of statistical learning and mental model updating without requiring many of the assumptions made by more traditional computational modeling methodologies.

【69】 TargetNet: Functional microRNA Target Prediction with Deep Neural Networks 标题：TargetNet：基于深度神经网络的功能性microRNA靶标预测

作者：Seonwoo Min,Byunghan Lee,Sungroh Yoon 机构：Department of Electrical and Computer Engineering, Seoul National University, Seoul , South Korea, Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology, Seoul , South Korea 备注：7 pages, under review 链接：https://arxiv.org/abs/2107.11381 摘要：MicroRNAs（miRNAs）通过与信使rna（mRNAs）的靶位点结合，在基因表达调控中起着关键作用。虽然识别miRNAs的功能靶点是非常重要的，但是它们的预测仍然是一个巨大的挑战。以前的计算算法有很大的局限性。他们使用保守的候选靶位点（CTS）选择标准，主要集中在标准位点类型上，依赖费时费力的手工特征提取，并且没有充分利用miRNA-CTS相互作用的信息。本文介绍了一种基于深度学习的功能性miRNA靶点预测算法TargetNet。为了解决以前方法的局限性，TargetNet有三个关键组成部分：（1）宽松的CTS选择标准，以适应种子区域的不规则性，（2）一种新的miRNA-CTS序列编码方案，包括扩展的种子区域比对，和（3）一个基于深度残差网络的预测模型。该模型用miRNA-CTS对数据集进行训练，并用miRNA-mRNA对数据集进行评价。TargetNet改进了以前用于功能性miRNA目标分类的最新算法。此外，它在区分高功能miRNA靶点方面显示出巨大的潜力。摘要：MicroRNAs (miRNAs) play pivotal roles in gene expression regulation by binding to target sites of messenger RNAs (mRNAs). While identifying functional targets of miRNAs is of utmost importance, their prediction remains a great challenge. Previous computational algorithms have major limitations. They use conservative candidate target site (CTS) selection criteria mainly focusing on canonical site types, rely on laborious and time-consuming manual feature extraction, and do not fully capitalize on the information underlying miRNA-CTS interactions. In this paper, we introduce TargetNet, a novel deep learning-based algorithm for functional miRNA target prediction. To address the limitations of previous approaches, TargetNet has three key components: (1) relaxed CTS selection criteria accommodating irregularities in the seed region, (2) a novel miRNA-CTS sequence encoding scheme incorporating extended seed region alignments, and (3) a deep residual network-based prediction model. The proposed model was trained with miRNA-CTS pair datasets and evaluated with miRNA-mRNA pair datasets. TargetNet advances the previous state-of-the-art algorithms used in functional miRNA target classification. Furthermore, it demonstrates great potential for distinguishing high-functional miRNA targets.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-27，如有侵权请联系 cloudcommunity@tencent.com 删除

linux