前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >人工智能学术速递[11.10]

人工智能学术速递[11.10]

作者头像
公众号-arXiv每日学术速递
发布2021-11-17 10:58:30
8500
发布2021-11-17 10:58:30
举报
文章被收录于专栏:arXiv每日学术速递

cs.AI人工智能,共计46篇

【1】 Turing-Universal Learners with Optimal Scaling Laws 标题:图灵-具有最优比例律的通用学习器 链接:https://arxiv.org/abs/2111.05321

作者:Preetum Nakkiran 机构:Halıcıoğlu Data Science Institute, University of California San Diego 摘要:对于给定的分布、学习算法和性能度量,收敛速度(或数据比例律)是算法测试性能的渐近行为,是训练样本数的函数。理论和实践中的许多学习方法都有幂律比率,即绩效等级为$n^{-\alpha}$,有些$\alpha>0$。此外,理论家和实践者都关心在感兴趣的环境下提高学习算法的速度。我们观察到“通用学习器”的存在,它在指定的运行时间内(例如$O(n^2)$)在所有学习算法中实现了最佳的可能分布相关的渐近速率,而在该运行时间内只会导致多段对数减速。该算法是统一的,不依赖于分布,但对于所有分布都能获得最佳的可能速率。该结构本身是Levin的通用搜索(Levin,1973)的简单扩展。就像宇宙搜索一样,宇宙学习者根本不实用,主要是理论和哲学兴趣。 摘要:For a given distribution, learning algorithm, and performance metric, the rate of convergence (or data-scaling law) is the asymptotic behavior of the algorithm's test performance as a function of number of train samples. Many learning methods in both theory and practice have power-law rates, i.e. performance scales as $n^{-\alpha}$ for some $\alpha > 0$. Moreover, both theoreticians and practitioners are concerned with improving the rates of their learning algorithms under settings of interest. We observe the existence of a "universal learner", which achieves the best possible distribution-dependent asymptotic rate among all learning algorithms within a specified runtime (e.g. $O(n^2)$), while incurring only polylogarithmic slowdown over this runtime. This algorithm is uniform, and does not depend on the distribution, and yet achieves best-possible rates for all distributions. The construction itself is a simple extension of Levin's universal search (Levin, 1973). And much like universal search, the universal learner is not at all practical, and is primarily of theoretical and philosophical interest.

【2】 A Differentiable Recipe for Learning Visual Non-Prehensile Planar Manipulation 标题:一种学习视觉非抓取平面操作的微分配方 链接:https://arxiv.org/abs/2111.05318

作者:Bernardo Aceituno,Alberto Rodriguez,Shubham Tulsiani,Abhinav Gupta,Mustafa Mukadam 机构:Massachusetts Institute of Technology,Facebook AI Research 备注:Presented at CORL 2021 摘要:通过视频指定任务是获得新颖和通用机器人技能的一种强大技术。然而,对力学和灵巧交互的推理可能会使衡量学习接触丰富的操作具有挑战性。在这项工作中,我们将重点放在视觉非可理解的平面操纵问题上:给定一个平面运动对象的视频,找到再现相同对象运动的接触感知机器人动作。我们提出了一种新的结构,可微操作学习(\ours),它通过利用可微优化和基于有限差分的模拟,将视频解码神经模型与接触力学的先验知识相结合。通过大量的模拟实验,我们研究了传统的基于模型的技术和现代深度学习方法之间的相互作用。我们发现,我们的模块化和完全可微的体系结构在看不见的对象和运动上比只学习的方法表现得更好\网址{https://github.com/baceituno/dlm}. 摘要:Specifying tasks with videos is a powerful technique towards acquiring novel and general robot skills. However, reasoning over mechanics and dexterous interactions can make it challenging to scale learning contact-rich manipulation. In this work, we focus on the problem of visual non-prehensile planar manipulation: given a video of an object in planar motion, find contact-aware robot actions that reproduce the same object motion. We propose a novel architecture, Differentiable Learning for Manipulation (\ours), that combines video decoding neural models with priors from contact mechanics by leveraging differentiable optimization and finite difference based simulation. Through extensive simulated experiments, we investigate the interplay between traditional model-based techniques and modern deep learning approaches. We find that our modular and fully differentiable architecture performs better than learning-only methods on unseen objects and motions. \url{https://github.com/baceituno/dlm}.

【3】 Can Information Flows Suggest Targets for Interventions in Neural Circuits? 标题:信息流能建议神经回路干预的目标吗? 链接:https://arxiv.org/abs/2111.05299

作者:Praveen Venkatesh,Sanghamitra Dutta,Neil Mehta,Pulkit Grover 机构:Allen Institute,University of Washington, Seattle; ,JP Morgan Chase AI Research;, –,Department of Electrical and Computer Engineering,Neuroscience Institute, Carnegie Mellon University 备注:Accepted to the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021). (29 pages; 61 figures) 摘要:在神经科学和临床应用的推动下,我们经验性地检验了信息流的观察指标是否可以建议干预措施。我们通过在机器学习公平性的背景下对人工神经网络进行实验来实现这一点,其目标是通过干预在系统中诱导公平性。使用我们最近开发的$M$-信息流框架,我们测量了关于真实标签的信息流(对准确性负责,因此是可取的),以及关于受保护属性的信息流(对偏差负责,因此是不可取的),在经过训练的神经网络的边上。然后,我们将流量大小与通过修剪对这些边缘进行干预的效果进行比较。我们表明,修剪携带有关受保护属性的更大信息流的边在更大程度上减少了输出的偏差。这表明$M$-信息流可以有意义地建议干预目标,以肯定的方式回答标题的问题。我们还评估了不同干预策略的偏差-准确度权衡,以分析人们如何使用对理想和不理想信息流(此处为准确度和偏差流)的估计来告知干预措施,从而保留前者,同时减少后者。 摘要:Motivated by neuroscientific and clinical applications, we empirically examine whether observational measures of information flow can suggest interventions. We do so by performing experiments on artificial neural networks in the context of fairness in machine learning, where the goal is to induce fairness in the system through interventions. Using our recently developed $M$-information flow framework, we measure the flow of information about the true label (responsible for accuracy, and hence desirable), and separately, the flow of information about a protected attribute (responsible for bias, and hence undesirable) on the edges of a trained neural network. We then compare the flow magnitudes against the effect of intervening on those edges by pruning. We show that pruning edges that carry larger information flows about the protected attribute reduces bias at the output to a greater extent. This demonstrates that $M$-information flow can meaningfully suggest targets for interventions, answering the title's question in the affirmative. We also evaluate bias-accuracy tradeoffs for different intervention strategies, to analyze how one might use estimates of desirable and undesirable information flows (here, accuracy and bias flows) to inform interventions that preserve the former while reducing the latter.

【4】 Sliced Recursive Transformer 标题:片式递归Transformer 链接:https://arxiv.org/abs/2111.05297

作者:Zhiqiang Shen,Zechun Liu,Eric Xing 机构:‡CMU §MBZUAI 备注:Code and models are available at this https URL 摘要:我们提出了一种简洁而有效的视觉转换器递归操作,可以在不涉及额外参数的情况下提高参数利用率。这是通过在Transformer网络的深度上共享权重来实现的。该方法只需使用na即可获得可观的增益(~2%)\ive递归运算,不需要特殊或复杂的网络设计原理知识,并将训练过程的计算开销降至最低。为了在保持较高精度的同时减少递归运算带来的额外计算量,我们提出了一种跨递归层的多切片组自关注近似方法,该方法可以在性能损失最小的情况下将成本消耗降低10~30%。我们将我们的模型称为切片递归变换器(SReT),它与高效视觉变换器的广泛其他设计兼容。与最先进的方法相比,我们的最佳模型在包含较少参数的情况下显著改进了ImageNet。建议的切片递归操作允许我们在较小的尺寸(13~15M)下轻松构建100层甚至1000层以上的Transformer,以避免模型尺寸过大时出现优化困难。灵活的可伸缩性在扩展和构建极深和大维度的视觉转换器方面显示出巨大的潜力。我们的代码和模型可在https://github.com/szq0214/SReT. 摘要:We present a neat yet effective recursive operation on vision transformers that can improve parameter utilization without involving additional parameters. This is achieved by sharing weights across depth of transformer networks. The proposed method can obtain a substantial gain (~2%) simply using na\"ive recursive operation, requires no special or sophisticated knowledge for designing principles of networks, and introduces minimum computational overhead to the training procedure. To reduce the additional computation caused by recursive operation while maintaining the superior accuracy, we propose an approximating method through multiple sliced group self-attentions across recursive layers which can reduce the cost consumption by 10~30% with minimal performance loss. We call our model Sliced Recursive Transformer (SReT), which is compatible with a broad range of other designs for efficient vision transformers. Our best model establishes significant improvement on ImageNet over state-of-the-art methods while containing fewer parameters. The proposed sliced recursive operation allows us to build a transformer with more than 100 or even 1000 layers effortlessly under a still small size (13~15M), to avoid difficulties in optimization when the model size is too large. The flexible scalability has shown great potential for scaling up and constructing extremely deep and large dimensionality vision transformers. Our code and models are available at https://github.com/szq0214/SReT.

【5】 Unsupervised Learning for Identifying High Eigenvector Centrality Nodes: A Graph Neural Network Approach 标题:识别高特征向量中心度节点的无监督学习:一种图神经网络方法 链接:https://arxiv.org/abs/2111.05264

作者:Appan Rakaraddi,Mahardhika Pratama 机构:School of Computer Science, NTU, Singapore 备注:accepted in IEEE BigData 2021 摘要:现有的计算特征向量中心度(EC)的方法往往不足以在较低的时间复杂度下确定EC,或者对于大型网络而言不具有良好的可扩展性,因此使得它们实际上不可靠/计算代价昂贵。因此,开发一种在低计算时间内可扩展的方法至关重要。因此,我们提出了一个深度学习模型来识别具有高特征向量中心性的节点。在使用监督学习方法识别排名靠前的节点方面已有一些工作,但在现实世界中,图形没有标记,因此监督学习方法的部署成为一种危险,其使用变得不切实际。因此,我们设计了CUL(Centrality with Unsupervised Learning)方法来以无监督的方式学习网络中的相对EC分数。为了实现这一点,我们开发了一个基于编码器-解码器的框架,将节点映射到各自的估计EC分数。在不同的合成和真实网络上进行了广泛的实验。我们比较了CUL和基线监督的EC估计方法,类似于过去的一些工作。有人观察到,即使在少量的训练数据集上进行训练,CUL在识别排名较高的节点时,也比其监督对应节点提供了相对更好的准确度得分。我们还表明,CUL比传统的EC计算基线方法速度更快,运行时间更小。该守则可于https://github.com/codexhammer/CUL. 摘要:The existing methods to calculate the Eigenvector Centrality(EC) tend to not be robust enough for determination of EC in low time complexity or not well-scalable for large networks, hence rendering them practically unreliable/ computationally expensive. So, it is of the essence to develop a method that is scalable in low computational time. Hence, we propose a deep learning model for the identification of nodes with high Eigenvector Centrality. There have been a few previous works in identifying the high ranked nodes with supervised learning methods, but in real-world cases, the graphs are not labelled and hence deployment of supervised learning methods becomes a hazard and its usage becomes impractical. So, we devise CUL(Centrality with Unsupervised Learning) method to learn the relative EC scores in a network in an unsupervised manner. To achieve this, we develop an Encoder-Decoder based framework that maps the nodes to their respective estimated EC scores. Extensive experiments were conducted on different synthetic and real-world networks. We compared CUL against a baseline supervised method for EC estimation similar to some of the past works. It was observed that even with training on a minuscule number of training datasets, CUL delivers a relatively better accuracy score when identifying the higher ranked nodes than its supervised counterpart. We also show that CUL is much faster and has a smaller runtime than the conventional baseline method for EC computation. The code is available at https://github.com/codexhammer/CUL.

【6】 Learning Perceptual Concepts by Bootstrapping from Human Queries 标题:从人类查询中通过自举学习知觉概念 链接:https://arxiv.org/abs/2111.05251

作者:Andreea Bobu,Chris Paxton,Wei Yang,Balakumar Sundaralingam,Yu-Wei Chao,Maya Cakmak,Dieter Fox 机构:com 3 University of Washington mcakmak 备注:7 pages, 7 figures 摘要:机器人需要能够从用户那里学习概念,以便使其能力适应每个用户的独特任务。但当机器人在高维输入(如图像或点云)上操作时,这是不切实际的:机器人需要不切实际的人力来学习新概念。为了应对这一挑战,我们提出了一种新的方法,即机器人学习概念的低维变体,并使用它生成更大的数据集,以便在高维空间学习概念。这使得它可以利用只有在训练时才可访问的语义上有意义的特权信息,如对象姿势和边界框,从而允许更丰富的人机交互来加速学习。我们通过学习描述对象状态或多对象关系的介词概念来评估我们的方法,如上图、近图或对齐图,它们是用户指定机器人任务目标和执行约束的关键。使用模拟人,我们表明,与直接在高维空间学习概念相比,我们的方法提高了样本复杂性。我们还演示了所学概念在7自由度Franka Panda机器人运动规划任务中的实用性。 摘要:Robots need to be able to learn concepts from their users in order to adapt their capabilities to each user's unique task. But when the robot operates on high-dimensional inputs, like images or point clouds, this is impractical: the robot needs an unrealistic amount of human effort to learn the new concept. To address this challenge, we propose a new approach whereby the robot learns a low-dimensional variant of the concept and uses it to generate a larger data set for learning the concept in the high-dimensional space. This lets it take advantage of semantically meaningful privileged information only accessible at training time, like object poses and bounding boxes, that allows for richer human interaction to speed up learning. We evaluate our approach by learning prepositional concepts that describe object state or multi-object relationships, like above, near, or aligned, which are key to user specification of task goals and execution constraints for robots. Using a simulated human, we show that our approach improves sample complexity when compared to learning concepts directly in the high-dimensional space. We also demonstrate the utility of the learned concepts in motion planning tasks on a 7-DoF Franka Panda robot.

【7】 Reason first, then respond: Modular Generation for Knowledge-infused Dialogue 标题:先理性后回应:知识灌输对话的模块化生成 链接:https://arxiv.org/abs/2111.05204

作者:Leonard Adolphs,Kurt Shuster,Jack Urbanek,Arthur Szlam,Jason Weston 机构:ETH Zürich, Facebook AI Research 摘要:大型语言模型可以产生流畅的对话,但往往会产生事实不准确的幻觉。虽然检索增强模型有助于缓解这一问题,但它们仍然面临着推理以提供正确知识和同时生成对话的困难挑战。在这项工作中,我们提出了一个模块化的模型,知识到响应(K2R),用于将知识整合到会话代理中,它将这个问题分解为两个简单的步骤。K2R首先在给定对话上下文的情况下生成一个知识序列,作为中间步骤。在这个“推理步骤”之后,模型会关注自己生成的知识序列以及对话上下文,以产生最终的响应。在详细的实验中,我们发现这种模型在基于知识的对话任务中产生的幻觉较少,并且在可解释性和模块性方面具有优势。特别是,它可以用于将QA和对话系统融合在一起,以使对话代理能够给出有知识的答案,或者QA模型能够在零触发设置下给出对话响应。 摘要:Large language models can produce fluent dialogue but often hallucinate factual inaccuracies. While retrieval-augmented models help alleviate this issue, they still face a difficult challenge of both reasoning to provide correct knowledge and generating conversation simultaneously. In this work, we propose a modular model, Knowledge to Response (K2R), for incorporating knowledge into conversational agents, which breaks down this problem into two easier steps. K2R first generates a knowledge sequence, given a dialogue context, as an intermediate step. After this "reasoning step", the model then attends to its own generated knowledge sequence, as well as the dialogue context, to produce a final response. In detailed experiments, we find that such a model hallucinates less in knowledge-grounded dialogue tasks, and has advantages in terms of interpretability and modularity. In particular, it can be used to fuse QA and dialogue systems together to enable dialogue agents to give knowledgeable answers, or QA models to give conversational responses in a zero-shot setting.

【8】 Does Thermal data make the detection systems more reliable? 标题:热数据是否使检测系统更加可靠? 链接:https://arxiv.org/abs/2111.05191

作者:Shruthi Gowda,Bahram Zonooz,Elahe Arani 机构:Advanced Research Lab, NavInfo Europe, The Netherlands 备注:Accepted at NeurIPS 2021 - ML4AD workshop (The code for this research is available at: this https URL) 摘要:基于深度学习的检测网络在自动驾驶系统(ADS)方面取得了显著的进展。ADS应在各种环境照明和恶劣天气条件下具有可靠的性能。然而,亮度下降和视觉障碍(如眩光、雾)会导致视觉摄像机的图像质量差,从而导致性能下降。为了克服这些挑战,我们探讨了利用不同的数据模式的想法,该模式与可视数据完全不同,但又是互补的。我们提出了一个基于多模态协作框架的综合检测系统,该框架同时学习RGB(来自视觉摄像机)和热(来自红外摄像机)数据。该框架协同训练两个网络,并在学习其自身模式的最佳特征时提供灵活性,同时还结合了其他模式的补充知识。我们广泛的实证结果表明,虽然精度的提高是名义上的,但其价值在于具有挑战性和极端困难的边缘情况,这在AD等安全关键应用中至关重要。我们提供了在检测中使用热成像系统的优缺点的整体视图。 摘要:Deep learning-based detection networks have made remarkable progress in autonomous driving systems (ADS). ADS should have reliable performance across a variety of ambient lighting and adverse weather conditions. However, luminance degradation and visual obstructions (such as glare, fog) result in poor quality images by the visual camera which leads to performance decline. To overcome these challenges, we explore the idea of leveraging a different data modality that is disparate yet complementary to the visual data. We propose a comprehensive detection system based on a multimodal-collaborative framework that learns from both RGB (from visual cameras) and thermal (from Infrared cameras) data. This framework trains two networks collaboratively and provides flexibility in learning optimal features of its own modality while also incorporating the complementary knowledge of the other. Our extensive empirical results show that while the improvement in accuracy is nominal, the value lies in challenging and extremely difficult edge cases which is crucial in safety-critical applications such as AD. We provide a holistic view of both merits and limitations of using a thermal imaging system in detection.

【9】 Self-checking Logical Agents 标题:自检逻辑代理 链接:https://arxiv.org/abs/2111.05157

作者:Stefania Costantini 机构:Dip. di Ingegneria e Scienze dell’Informazione (DISIM), Universita di L’Aquila, Coppito 备注:None 摘要:本文通过动态检查时态公理,提出了一个逻辑代理运行时自检的综合框架。这些公理是通过使用为此目的定义的面向代理的时间间隔逻辑来指定的。我们为这个新的逻辑定义了语法、语义和语用学,专门为代理应用程序定制。在由此产生的框架中,我们涵盖并扩展了我们过去的工作。 摘要:This paper presents a comprehensive framework for run-time self-checking of logical agents, by means of temporal axioms to be dynamically checked. These axioms are specified by using an agent-oriented interval temporal logic defined to this purpose. We define syntax, semantics and pragmatics for this new logic, specifically tailored for application to agents. In the resulting framework, we encompass and extend our past work.

【10】 Losses, Dissonances, and Distortions 标题:损失、不和谐和扭曲 链接:https://arxiv.org/abs/2111.05128

作者:Pablo Samuel Castro 机构:Google Research, Brain Team 备注:In the 5th Machine Learning for Creativity and Design Workshop at NeurIPS 2021 摘要:在这篇论文中,我提出了一项研究,利用在训练简单函数逼近器过程中获得的损失和梯度,作为在钢琴独奏表演环境中产生音乐不和谐和视觉失真的机制。这些不和谐和扭曲不仅通过影响视觉效果,而且通过影响艺术音乐表演,成为艺术表演的一部分。该系统的设计使得表演者可以反过来影响训练过程本身,从而在两个过程之间创建一个闭环反馈:机器学习模型的训练和即兴钢琴曲的演奏。 摘要:In this paper I present a study in using the losses and gradients obtained during the training of a simple function approximator as a mechanism for creating musical dissonance and visual distortion in a solo piano performance setting. These dissonances and distortions become part of an artistic performance not just by affecting the visualizations, but also by affecting the artistic musical performance. The system is designed such that the performer can in turn affect the training process itself, thereby creating a closed feedback loop between two processes: the training of a machine learning model and the performance of an improvised piano piece.

【11】 "How Does It Detect A Malicious App?" Explaining the Predictions of AI-based Android Malware Detector 链接:https://arxiv.org/abs/2111.05108

作者:Zhi Lu,Vrizlynn L. L. Thing 机构:Cyber Security Strategic Technology Centre, ST Engineering, Singapore 摘要:人工智能方法已被证明在Android恶意软件检测方面具有令人印象深刻的性能。然而,大多数基于人工智能的方法都是以黑盒的方式对可疑样本进行预测,对模型的推断没有透明度。网络安全和人工智能从业者对模型的可解释性和透明度的期望增加,以确保可信度。在这篇文章中,我们提出了一种新的模型不可知解释方法,用于Android恶意软件检测的AI模型。我们提出的方法通过两个步骤识别和量化与预测相关的数据特征:i)通过操纵特征值生成合成数据的数据扰动;以及ii)优化特征属性值,以寻求在特征值变化最小的扰动数据上预测分数的显著变化。通过三个实验验证了该方法的有效性。我们首先证明了我们提出的模型解释方法可以帮助发现人工智能模型是如何被对抗性样本定量回避的。在接下来的实验中,我们将我们提出的方法的可解释性和保真度分别与现有技术进行比较。 摘要:AI methods have been proven to yield impressive performance on Android malware detection. However, most AI-based methods make predictions of suspicious samples in a black-box manner without transparency on models' inference. The expectation on models' explainability and transparency by cyber security and AI practitioners to assure the trustworthiness increases. In this article, we present a novel model-agnostic explanation method for AI models applied for Android malware detection. Our proposed method identifies and quantifies the data features relevance to the predictions by two steps: i) data perturbation that generates the synthetic data by manipulating features' values; and ii) optimization of features attribution values to seek significant changes of prediction scores on the perturbed data with minimal feature values changes. The proposed method is validated by three experiments. We firstly demonstrate that our proposed model explanation method can aid in discovering how AI models are evaded by adversarial samples quantitatively. In the following experiments, we compare the explainability and fidelity of our proposed method with state-of-the-arts, respectively.

【12】 MixACM: Mixup-Based Robustness Transfer via Distillation of Activated Channel Maps 标题:MixACM:通过提取激活的通道映射实现基于混合的健壮性传递 链接:https://arxiv.org/abs/2111.05073

作者:Muhammad Awais,Fengwei Zhou,Chuanlong Xie,Jiawei Li,Sung-Ho Bae,Zhenguo Li 机构: Huawei Noah’s Ark Lab, Department of Computer Science, Kyung-Hee University, South Korea 备注:Accepted by NeurIPS 2021 摘要:深层神经网络容易受到自然输入中的不利、微小和不可察觉的变化的影响。对抗这些例子最有效的防御机制是对抗性训练,它通过迭代损失最大化在训练期间构造对抗性例子。然后对模型进行训练,以使这些构造的示例的损失最小化。这种最小-最大优化需要更多的数据、更大的容量模型和额外的计算资源。它还会降低模型的标准泛化性能。我们能否更有效地实现健壮性?在这项工作中,我们从知识转移的角度来探讨这个问题。首先,我们从理论上证明了鲁棒性在混合增强的帮助下从一个经过对抗训练的教师模型到一个学生模型的可转移性。其次,我们提出了一种新的鲁棒性传输方法,称为基于混合的激活信道映射(MixACM)传输。MixACM通过匹配生成的激活通道图(无需昂贵的对抗性干扰),将鲁棒性从健壮的教师传递给学生。最后,在多个数据集和不同的学习场景上进行的大量实验表明,我们的方法可以在提高自然图像泛化能力的同时传递鲁棒性。 摘要:Deep neural networks are susceptible to adversarially crafted, small and imperceptible changes in the natural inputs. The most effective defense mechanism against these examples is adversarial training which constructs adversarial examples during training by iterative maximization of loss. The model is then trained to minimize the loss on these constructed examples. This min-max optimization requires more data, larger capacity models, and additional computing resources. It also degrades the standard generalization performance of a model. Can we achieve robustness more efficiently? In this work, we explore this question from the perspective of knowledge transfer. First, we theoretically show the transferability of robustness from an adversarially trained teacher model to a student model with the help of mixup augmentation. Second, we propose a novel robustness transfer method called Mixup-Based Activated Channel Maps (MixACM) Transfer. MixACM transfers robustness from a robust teacher to a student by matching activated channel maps generated without expensive adversarial perturbations. Finally, extensive experiments on multiple datasets and different learning scenarios show our method can transfer robustness while also improving generalization on natural images.

【13】 Conformity Assessments and Post-market Monitoring: A Guide to the Role of Auditing in the Proposed European AI Regulation 标题:符合性评估和上市后监控:审计在拟议的欧洲人工智能法规中的作用指南 链接:https://arxiv.org/abs/2111.05071

作者:Jakob Mokander,Maria Axente,Federico Casolari,Luciano Floridi 机构:. Oxford Internet Institute, University of Oxford, St Giles’, Oxford OX,JS, UK, . Member of the Advisory Board, UK All Party Parliamentary Group on AI ( APPG AI), London, UK 备注:None 摘要:拟议中的《欧洲人工智能法》(AIA)首次尝试为全球主要经济体实施的人工智能制定一个通用法律框架。因此,AIA很可能成为有关人工智能系统如何(也应该)监管的更大讨论中的一个参考点。在本文中,我们描述并讨论了AIA中提出的两种主要执行机制:高风险AI系统供应商预期进行的合规性评估,以及供应商必须制定的上市后监控计划,以记录高风险AI系统在其整个生命周期内的表现。我们认为,AIA可以被解释为建立一个全欧洲范围的生态系统来进行人工智能审计的建议,尽管换言之。我们的分析提供了两个主要贡献。首先,通过从AI审计的现有文献中借用术语描述AIA中包含的执行机制,我们帮助AI系统的提供商了解他们如何在实践中证明遵守AIA中规定的要求。第二,通过从审计角度审视友邦保险,我们试图从先前的研究中获得可转移的经验教训,这些经验教训涉及如何进一步完善友邦保险中概述的监管方法。最后,我们强调了友邦保险的七个方面,其中的修订(或简单的澄清)将有所帮助。最重要的是,需要将模糊的概念转化为可核查的标准,并加强基于内部检查的符合性评估的体制保障。 摘要:The proposed European Artificial Intelligence Act (AIA) is the first attempt to elaborate a general legal framework for AI carried out by any major global economy. As such, the AIA is likely to become a point of reference in the larger discourse on how AI systems can (and should) be regulated. In this article, we describe and discuss the two primary enforcement mechanisms proposed in the AIA: the conformity assessments that providers of high-risk AI systems are expected to conduct, and the post-market monitoring plans that providers must establish to document the performance of high-risk AI systems throughout their lifetimes. We argue that AIA can be interpreted as a proposal to establish a Europe-wide ecosystem for conducting AI auditing, albeit in other words. Our analysis offers two main contributions. First, by describing the enforcement mechanisms included in the AIA in terminology borrowed from existing literature on AI auditing, we help providers of AI systems understand how they can prove adherence to the requirements set out in the AIA in practice. Second, by examining the AIA from an auditing perspective, we seek to provide transferable lessons from previous research about how to refine further the regulatory approach outlined in the AIA. We conclude by highlighting seven aspects of the AIA where amendments (or simply clarifications) would be helpful. These include, above all, the need to translate vague concepts into verifiable criteria and to strengthen the institutional safeguards concerning conformity assessments based on internal checks.

【14】 Almost Optimal Universal Lower Bound for Learning Causal DAGs with Atomic Interventions 标题:原子干预学习因果DAG的几乎最优通用下界 链接:https://arxiv.org/abs/2111.05070

作者:Vibhor Porwal,Piyush Srivastava,Gaurav Sinha 机构:com†Tata Institute of Fundamental Research, comPS acknowledges support from the Department of Atomic Energy 摘要:在因果有向无环图(DAG)的结构学习问题中出现的一个研究得很好的挑战是,使用观测数据,只能将图学习到“马尔可夫等价类”(MEC)。剩余的无向边必须使用干预来定向,在应用程序中执行干预可能非常昂贵。因此,尽可能减少MEC完全定位所需的干预数量的问题最近受到了广泛关注,也是这项工作的重点。我们证明了两个主要结果。第一个是任何算法(主动或被动)都需要执行的原子干预数量的新的通用下限,以确定给定MEC的方向。我们的第二个结果表明,事实上,这个界限在能够确定MEC方向的最小原子干涉集大小的两倍以内。我们的下界可以证明比以前已知的下界更好。我们的下界的证明基于CBSP序的新概念,CBSP序是没有v-结构且满足某些特殊性质的DAG的拓扑序。此外,通过对合成图进行模拟,并给出特殊图族的例子,我们证明了我们的界通常明显更好。 摘要:A well-studied challenge that arises in the structure learning problem of causal directed acyclic graphs (DAG) is that using observational data, one can only learn the graph up to a "Markov equivalence class" (MEC). The remaining undirected edges have to be oriented using interventions, which can be very expensive to perform in applications. Thus, the problem of minimizing the number of interventions needed to fully orient the MEC has received a lot of recent attention, and is also the focus of this work. We prove two main results. The first is a new universal lower bound on the number of atomic interventions that any algorithm (whether active or passive) would need to perform in order to orient a given MEC. Our second result shows that this bound is, in fact, within a factor of two of the size of the smallest set of atomic interventions that can orient the MEC. Our lower bound is provably better than previously known lower bounds. The proof of our lower bound is based on the new notion of CBSP orderings, which are topological orderings of DAGs without v-structures and satisfy certain special properties. Further, using simulations on synthetic graphs and by giving examples of special graph families, we show that our bound is often significantly better.

【15】 Neural News Recommendation with Event Extraction 标题:基于事件抽取的神经新闻推荐 链接:https://arxiv.org/abs/2111.05068

作者:Songqiao Han,Hailiang Huang,Jiangwei Liu 机构:School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai , China 备注:11 pages, 4 figures, 2 tables 摘要:在线新闻推荐的一个关键挑战是帮助用户找到他们感兴趣的文章。传统的新闻推荐方法通常使用单一的新闻信息,这不足以对新闻进行编码和用户表示。最近的研究使用多渠道新闻信息,例如标题、类别和正文,以增强新闻和用户的代表性。然而,这些方法只使用各种注意机制来融合多视图嵌入,而没有考虑深入挖掘上下文中包含的高层信息。这些方法在词级对新闻内容进行编码,并在推荐网络中联合训练注意参数,从而需要更多的语料库来训练模型。我们提出了一个基于事件提取的新闻推荐(EENR)框架来克服这些缺点,利用事件提取来提取更高层次的信息。EENR还使用两阶段策略来减少推荐网络后续部分中的参数。第一阶段通过外部语料库对事件抽取模块进行训练,第二阶段将训练后的模型应用于新闻推荐数据集,预测事件级信息,包括事件类型、角色和参数。然后,我们融合多渠道信息,包括事件信息、新闻标题和类别,对新闻和用户进行编码。在真实数据集上的大量实验表明,我们的EENR方法可以有效地提高新闻推荐的性能。最后,我们还探讨了利用更高抽象层次的信息替代新闻正文内容的合理性。 摘要:A key challenge of online news recommendation is to help users find articles they are interested in. Traditional news recommendation methods usually use single news information, which is insufficient to encode news and user representation. Recent research uses multiple channel news information, e.g., title, category, and body, to enhance news and user representation. However, these methods only use various attention mechanisms to fuse multi-view embeddings without considering deep digging higher-level information contained in the context. These methods encode news content on the word level and jointly train the attention parameters in the recommendation network, leading to more corpora being required to train the model. We propose an Event Extraction-based News Recommendation (EENR) framework to overcome these shortcomings, utilizing event extraction to abstract higher-level information. EENR also uses a two-stage strategy to reduce parameters in subsequent parts of the recommendation network. We train the Event Extraction module by external corpora in the first stage and apply the trained model to the news recommendation dataset to predict event-level information, including event types, roles, and arguments, in the second stage. Then we fuse multiple channel information, including event information, news title, and category, to encode news and users. Extensive experiments on a real-world dataset show that our EENR method can effectively improve the performance of news recommendations. Finally, we also explore the reasonability of utilizing higher abstract level information to substitute news body content.

【16】 Tightening the Approximation Error of Adversarial Risk with Auto Loss Function Search 标题:用自动损失函数搜索法缩小对抗性风险的逼近误差 链接:https://arxiv.org/abs/2111.05063

作者:Pengfei Xia,Ziqiang Li,Bin Li 机构: Li are with the Department of Electronic Engineeringand Information Science, University of Science and Technology of China 摘要:大量研究表明,深层神经网络很容易被对立的例子误导。有效评估模型的对抗鲁棒性对于其在实际应用中的部署非常重要。目前,一种常见的评估方法是通过构建恶意实例和执行攻击,将模型的对抗风险近似为鲁棒性指标。不幸的是,近似值和真实值之间存在误差(间隙)。以前的研究手动设计攻击方法,以实现较小的错误,这是低效的,可能会错过更好的解决方案。在本文中,我们将逼近误差的收紧建立为一个优化问题,并尝试用一种算法来解决它。更具体地说,我们首先分析了用替代损失替换非凸和不连续的0-1损失,这是计算近似值的必要折衷,是产生误差的主要原因之一。然后,我们提出了第一种搜索损失函数的方法AutoLoss AR,以减小对抗性风险的近似误差。在多种环境下进行了广泛的实验。结果证明了该方法的有效性:在MNIST和CIFAR-10上,最佳发现的损失函数分别比手工基线高0.9%-2.9%和0.7%-2.0%。此外,我们还验证了搜索到的损失可以转移到其他设置中,并通过可视化本地损失情况来探索为什么它们比基线更好。 摘要:Numerous studies have demonstrated that deep neural networks are easily misled by adversarial examples. Effectively evaluating the adversarial robustness of a model is important for its deployment in practical applications. Currently, a common type of evaluation is to approximate the adversarial risk of a model as a robustness indicator by constructing malicious instances and executing attacks. Unfortunately, there is an error (gap) between the approximate value and the true value. Previous studies manually design attack methods to achieve a smaller error, which is inefficient and may miss a better solution. In this paper, we establish the tightening of the approximation error as an optimization problem and try to solve it with an algorithm. More specifically, we first analyze that replacing the non-convex and discontinuous 0-1 loss with a surrogate loss, a necessary compromise in calculating the approximation, is one of the main reasons for the error. Then we propose AutoLoss-AR, the first method for searching loss functions for tightening the approximation error of adversarial risk. Extensive experiments are conducted in multiple settings. The results demonstrate the effectiveness of the proposed method: the best-discovered loss functions outperform the handcrafted baseline by 0.9%-2.9% and 0.7%-2.0% on MNIST and CIFAR-10, respectively. Besides, we also verify that the searched losses can be transferred to other settings and explore why they are better than the baseline by visualizing the local loss landscape.

【17】 An effective hybrid search algorithm for the multiple traveling repairman problem with profits 标题:求解带利润的多行程维修员问题的一种有效混合搜索算法 链接:https://arxiv.org/abs/2111.05017

作者:Jintong Ren,Jin-Kao Hao,Feng Wu,Zhang-Hua Fu 机构:The Chinese University of Hong Kong, Shenzhen , P.R. China, University of Science and Technology of China, Hefei , P.R.China, LERIA, Universit´e d’Angers, boulevard Lavoisier, Angers, France 备注:35 pages 摘要:作为带利润的旅行修理工问题的推广,带利润的多旅行修理工问题由多个修理工组成,他们访问所有客户的子集,以使通过访问的客户获得的收益最大化。为了解决这一难题,提出了一种基于模因算法框架的混合搜索算法。它集成了两个显著的功能:一个专用的基于弧的交叉来生成高质量的后代解决方案,一个快速评估技术来降低探索经典邻域的复杂性。与领先的参考算法相比,我们在470个基准实例上展示了该算法的竞争力,并报告了137个实例的新最佳记录以及其他330个实例的相同最佳结果。我们研究了关键搜索组件对算法的重要性。 摘要:As an extension of the traveling repairman problem with profits, the multiple traveling repairman problem with profits consists of multiple repairmen who visit a subset of all customers to maximize the revenues collected through the visited customers. To solve this challenging problem, an effective hybrid search algorithm based on the memetic algorithm framework is proposed. It integrates two distinguished features: a dedicated arc-based crossover to generate high-quality offspring solutions and a fast evaluation technique to reduce the complexity of exploring the classical neighborhoods. We show the competitiveness of the algorithm on 470 benchmark instances compared to the leading reference algorithms and report new best records for 137 instances as well as equal best results for other 330 instances. We investigate the importance of the key search components for the algorithm.

【18】 Misspecified Gaussian Process Bandit Optimization 标题:误指定高斯过程的Bandit优化 链接:https://arxiv.org/abs/2111.05008

作者:Ilija Bogunovic,Andreas Krause 机构:ETH Zürich 备注:Accepted to NeurIPS 2021 摘要:我们考虑优化的黑盒函数的基础上嘈杂的土匪反馈。核化bandit算法在这个问题上表现出了很强的经验和理论性能。然而,他们严重依赖于模型被很好地指定的假设,如果没有它,可能会失败。相反,我们引入了一个\emph{misspecified}核化bandit设置,其中未知函数可以是$\epsilon$--在某些再生核希尔BERT空间(RKHS)中由一个有界范数的函数一致逼近。我们设计了高效实用的算法,在存在模型错误的情况下,其性能下降最小。具体来说,我们提出了两种基于高斯过程(GP)方法的算法:需要知道错误指定误差的乐观EC-GP-UCB算法和能够适应未知模型错误指定的阶段GP不确定性采样算法。我们根据$\epsilon$、时间范围和底层内核提供了它们的累积遗憾的上界,并且我们证明了我们的算法在没有错误指定的先验知识的情况下实现了对$\epsilon$的最佳依赖。此外,在随机背景下,我们证明了EC-GP-UCB可以有效地与遗憾边界平衡策略相结合,并在不知道$\epsilon$的情况下获得相似的遗憾边界。 摘要:We consider the problem of optimizing a black-box function based on noisy bandit feedback. Kernelized bandit algorithms have shown strong empirical and theoretical performance for this problem. They heavily rely on the assumption that the model is well-specified, however, and can fail without it. Instead, we introduce a \emph{misspecified} kernelized bandit setting where the unknown function can be $\epsilon$--uniformly approximated by a function with a bounded norm in some Reproducing Kernel Hilbert Space (RKHS). We design efficient and practical algorithms whose performance degrades minimally in the presence of model misspecification. Specifically, we present two algorithms based on Gaussian process (GP) methods: an optimistic EC-GP-UCB algorithm that requires knowing the misspecification error, and Phased GP Uncertainty Sampling, an elimination-type algorithm that can adapt to unknown model misspecification. We provide upper bounds on their cumulative regret in terms of $\epsilon$, the time horizon, and the underlying kernel, and we show that our algorithm achieves optimal dependence on $\epsilon$ with no prior knowledge of misspecification. In addition, in a stochastic contextual setting, we show that EC-GP-UCB can be effectively combined with the regret bound balancing strategy and attain similar regret bounds despite not knowing $\epsilon$.

【19】 Phantom: A High-Performance Computational Core for Sparse Convolutional Neural Networks 标题:Phantom:一种用于稀疏卷积神经网络的高性能计算核 链接:https://arxiv.org/abs/2111.05002

作者:Mahmood Azhar Qureshi,Arslan Munir 机构:State University, USA 备注:A version of this work is currently under review at the ACM Transactions on Embedded Computing Systems (TECS) 摘要:稀疏卷积神经网络(CNN)在过去几年中获得了显著的发展,因为稀疏卷积神经网络与密集的CNN相比,如果开发得当,可以大大减少模型大小和计算量。稀疏CNN通常会引入层形状和大小的变化,这会阻止密集加速器在稀疏CNN模型上表现良好。最近提出的稀疏加速器,如SCNN、Eyeriss v2和SparTen,积极利用双边或完全稀疏性,即权重和激活的稀疏性,以提高性能。然而,这些加速器要么具有低效的微体系结构(这限制了它们的性能),要么不支持非单位跨距卷积和完全连接(FC)层,要么严重受到系统负载不平衡的影响。为了避免这些问题并支持稀疏和密集模型,我们提出了Phantom,一个多线程、动态和灵活的神经计算核心。Phantom使用稀疏二进制掩码表示来主动地前瞻稀疏计算,并动态地调度其计算线程以最大化线程利用率和吞吐量。我们还生成了幻影神经计算核心的二维(2D)网格结构,我们称之为幻影2D加速器,并提出了一种新的数据流,支持CNN的所有层,包括单位和非单位步幅卷积以及FC层。此外,Phantom-2D采用两级负载平衡策略,最大限度地减少计算空闲,从而进一步提高硬件利用率。为了显示对不同类型层的支持,我们评估了VGG16和MobileNet上Phantom体系结构的性能。我们的仿真表明,Phantom-2D加速器在密集体系结构、SCNN、SparTen和Eyeriss v2上分别获得了12倍、4.1倍、1.98倍和2.36倍的性能增益。 摘要:Sparse convolutional neural networks (CNNs) have gained significant traction over the past few years as sparse CNNs can drastically decrease the model size and computations, if exploited befittingly, as compared to their dense counterparts. Sparse CNNs often introduce variations in the layer shapes and sizes, which can prevent dense accelerators from performing well on sparse CNN models. Recently proposed sparse accelerators like SCNN, Eyeriss v2, and SparTen, actively exploit the two-sided or full sparsity, that is, sparsity in both weights and activations, for performance gains. These accelerators, however, either have inefficient micro-architecture, which limits their performance, have no support for non-unit stride convolutions and fully-connected (FC) layers, or suffer massively from systematic load imbalance. To circumvent these issues and support both sparse and dense models, we propose Phantom, a multi-threaded, dynamic, and flexible neural computational core. Phantom uses sparse binary mask representation to actively lookahead into sparse computations, and dynamically schedule its computational threads to maximize the thread utilization and throughput. We also generate a two-dimensional (2D) mesh architecture of Phantom neural computational cores, which we refer to as Phantom-2D accelerator, and propose a novel dataflow that supports all layers of a CNN, including unit and non-unit stride convolutions, and FC layers. In addition, Phantom-2D uses a two-level load balancing strategy to minimize the computational idling, thereby, further improving the hardware utilization. To show support for different types of layers, we evaluate the performance of the Phantom architecture on VGG16 and MobileNet. Our simulations show that the Phantom-2D accelerator attains a performance gain of 12x, 4.1x, 1.98x, and 2.36x, over dense architectures, SCNN, SparTen, and Eyeriss v2, respectively.

【20】 Learning Numerical Action Models from Noisy Input Data 标题:从含噪输入数据中学习数值行动模型 链接:https://arxiv.org/abs/2111.04997

作者:José Á. Segura-Muros,Juan Fernández-Olivares,Raúl Pérez 机构: Ra´ulaaUniversidad de Granada, Campus Universitario de Cartuja 摘要:本文介绍了一种基于PlanMiner域学习算法的域学习技术——PlanMiner-N算法。本文提出的算法提高了PlanMiner在使用噪声数据作为输入时的学习能力。PlanMiner算法能够从输入数据中推断算术和逻辑表达式来学习数值规划域,但它的设计是在不完整的情况下工作的,这使得它在面对噪声输入数据时不可靠。在本文中,我们建议对PlanMiner的学习过程进行一系列增强,以扩展其从噪声数据中学习的能力。这些方法通过检测和过滤噪声对输入数据进行预处理,并研究学习的动作模型,以发现其中错误的前提条件/效果。本文提出的方法使用国际规划竞赛(IPC)的一组领域进行了测试。结果表明,当面对有噪声的输入数据时,PlanMiner-N大大提高了PlanMiner的性能。 摘要:This paper presents the PlanMiner-N algorithm, a domain learning technique based on the PlanMiner domain learning algorithm. The algorithm presented here improves the learning capabilities of PlanMiner when using noisy data as input. The PlanMiner algorithm is able to infer arithmetic and logical expressions to learn numerical planning domains from the input data, but it was designed to work under situations of incompleteness making it unreliable when facing noisy input data. In this paper, we propose a series of enhancements to the learning process of PlanMiner to expand its capabilities to learn from noisy data. These methods preprocess the input data by detecting noise and filtering it and study the learned action models learned to find erroneous preconditions/effects in them. The methods proposed in this paper were tested using a set of domains from the International Planning Competition (IPC). The results obtained indicate that PlanMiner-N improves the performance of PlanMiner greatly when facing noisy input data.

【21】 Ultra-Low Power Keyword Spotting at the Edge 标题:边缘的超低功耗关键词定位 链接:https://arxiv.org/abs/2111.04988

作者:Mehmet Gorkem Ulkar,Osman Erman Okman 机构:Analog Devices Inc., Istanbul, Turkey 备注:5 pages, 5 figures 摘要:关键字识别(KWS)已经成为我们周围许多智能设备不可或缺的一部分,因为音频是与这些设备交互的最有效方式之一。KWS解决方案的准确性和性能一直是研究人员的主要关注点,由于深入学习,该领域取得了实质性进展。然而,随着KWS的应用扩展到物联网设备中,除了性能外,能效成为一个非常关键的要求。我们相信,KWS解决方案在硬件和神经网络(NN)模型架构中寻求功率优化比文献中的许多解决方案更有利,因为文献中主要考虑了问题的架构方面。在这项工作中,我们通过考虑部署在超低功率CNN加速器MAX78000上的端到端能源效率,设计了一个优化的KWS CNN模型。通过结合硬件和模型优化方法,我们在12个类中实现了96.3%的精度,而每次推理仅消耗251 uJ。我们将我们的结果与文献中其他基于小占地面积神经网络的KWS解决方案进行了比较。此外,为了清晰起见,我们在功耗优化的ARM Cortex-M4F中分享了我们模型的能耗,以描述所选硬件的有效性。 摘要:Keyword spotting (KWS) has become an indispensable part of many intelligent devices surrounding us, as audio is one of the most efficient ways of interacting with these devices. The accuracy and performance of KWS solutions have been the main focus of the researchers, and thanks to deep learning, substantial progress has been made in this domain. However, as the use of KWS spreads into IoT devices, energy efficiency becomes a very critical requirement besides the performance. We believe KWS solutions that would seek power optimization both in the hardware and the neural network (NN) model architecture are advantageous over many solutions in the literature where mostly the architecture side of the problem is considered. In this work, we designed an optimized KWS CNN model by considering end-to-end energy efficiency for the deployment at MAX78000, an ultra-low-power CNN accelerator. With the combined hardware and model optimization approach, we achieve 96.3\% accuracy for 12 classes while only consuming 251 uJ per inference. We compare our results with other small-footprint neural network-based KWS solutions in the literature. Additionally, we share the energy consumption of our model in power-optimized ARM Cortex-M4F to depict the effectiveness of the chosen hardware for the sake of clarity.

【22】 Dynamic Parameterized Network for CTR Prediction 标题:CTR预测的动态参数化网络 链接:https://arxiv.org/abs/2111.04983

作者:Jian Zhu,Congcong Liu,Pei Wang,Xiwei Zhao,Guangpeng Chen,Junsheng Jin,Changping Peng,Zhangang Lin,Jingping Shao 摘要:在现代推荐系统的点击率(CTR)预测中,学习有效地捕获特征关系是至关重要的。大多数现有的CTR预测方法要么通过繁琐的手动设计的低阶交互,要么通过不灵活且低效的高阶交互来建模此类关系,这两种方法都需要额外的DNN模块来进行隐式交互建模。在本文中,我们提出了一种新的插件操作,动态参数化操作(DPO),用于明智地学习显式和隐式交互实例。我们发现,在DNN模块和注意模块中引入DPO分别有利于CTR预测的两个主要任务,即增强基于特征的建模的适应性和利用实例局部性改进用户行为建模。我们的动态参数化网络在公共数据集和真实生产数据集的离线实验以及在线A/B测试中显著优于最先进的方法。此外,拟议的动态参数化网络已部署在世界上最大的电子商务公司之一的排名系统中,服务于数亿活跃用户的主要流量。 摘要:Learning to capture feature relations effectively and efficiently is essential in click-through rate (CTR) prediction of modern recommendation systems. Most existing CTR prediction methods model such relations either through tedious manually-designed low-order interactions or through inflexible and inefficient high-order interactions, which both require extra DNN modules for implicit interaction modeling. In this paper, we proposed a novel plug-in operation, Dynamic Parameterized Operation (DPO), to learn both explicit and implicit interaction instance-wisely. We showed that the introduction of DPO into DNN modules and Attention modules can respectively benefit two main tasks in CTR prediction, enhancing the adaptiveness of feature-based modeling and improving user behavior modeling with the instance-wise locality. Our Dynamic Parameterized Networks significantly outperforms state-of-the-art methods in the offline experiments on the public dataset and real-world production dataset, together with an online A/B test. Furthermore, the proposed Dynamic Parameterized Networks has been deployed in the ranking system of one of the world's largest e-commerce companies, serving the main traffic of hundreds of millions of active users.

【23】 American Hate Crime Trends Prediction with Event Extraction 标题:基于事件提取的美国仇恨犯罪趋势预测 链接:https://arxiv.org/abs/2111.04951

作者:Songqiao Han,Hailiang Huang,Jiangwei Liu,Shengsheng Xiao 机构:School of Information Management and Engineering, Shanghai University of Finance and Economics, Shanghai , China 备注:12 pages, 5 figures, 4 tables 摘要:社交媒体平台可能为包含仇恨言论的话语提供潜在的空间,甚至更糟的是,可以作为仇恨犯罪的传播机制。FBI的统一犯罪报告(UCR)计划收集仇恨犯罪数据,并每年发布统计报告。这些统计数据为确定国家仇恨犯罪趋势提供了信息。这些统计数据还可以为执法机构提供有价值的整体和战略见解,或为立法者制定具体立法提供依据。然而,这些报告大多是在明年发布的,落后于许多迫切需要。最近的研究主要集中在社交媒体文本中仇恨言语的检测或对已确认犯罪的影响的实证研究。本文提出了一个框架,首先利用文本挖掘技术从《纽约时报》新闻中提取仇恨犯罪事件,然后利用结果预测美国国家和州层面的仇恨犯罪趋势。实验结果表明,与没有事件相关因素的时间序列或回归方法相比,我们的方法可以显著提高预测性能。我们的框架拓宽了国家层面和州层面仇恨犯罪趋势预测的方法。 摘要:Social media platforms may provide potential space for discourses that contain hate speech, and even worse, can act as a propagation mechanism for hate crimes. The FBI's Uniform Crime Reporting (UCR) Program collects hate crime data and releases statistic report yearly. These statistics provide information in determining national hate crime trends. The statistics can also provide valuable holistic and strategic insight for law enforcement agencies or justify lawmakers for specific legislation. However, the reports are mostly released next year and lag behind many immediate needs. Recent research mainly focuses on hate speech detection in social media text or empirical studies on the impact of a confirmed crime. This paper proposes a framework that first utilizes text mining techniques to extract hate crime events from New York Times news, then uses the results to facilitate predicting American national-level and state-level hate crime trends. Experimental results show that our method can significantly enhance the prediction performance compared with time series or regression methods without event-related factors. Our framework broadens the methods of national-level and state-level hate crime trends prediction.

【24】 How to Train Your Neural Network: A Comparative Evaluation 标题:如何训练你的神经网络:一个比较评估 链接:https://arxiv.org/abs/2111.04949

作者:Shu-Huai Lin,Daniel Nichols,Siddharth Singh,Abhinav Bhatele 机构:Department of Computer Science, University of Maryland, College Park, MD 摘要:深度学习领域已经见证了一个显著的转变,即向计算和记忆密集型神经网络的转变。这些新的更大的模型使研究人员能够在各个领域推进最先进的工具。这一现象刺激了神经网络分布式训练算法在大量硬件加速器上的发展。在本文中,我们讨论并比较了目前最先进的大规模分布式深度学习框架。首先,我们调查了分布式学习的当前实践,并确定了使用的不同类型的并行性。然后,我们给出了实证结果,比较了他们在大型图像和语言训练任务中的表现。此外,我们还讨论了它们的统计效率和内存消耗行为。基于我们的结果,我们讨论了每个框架中阻碍性能的算法和实现部分。 摘要:The field of deep learning has witnessed a remarkable shift towards extremely compute- and memory-intensive neural networks. These newer larger models have enabled researchers to advance state-of-the-art tools across a variety of fields. This phenomenon has spurred the development of algorithms for distributed training of neural networks over a larger number of hardware accelerators. In this paper, we discuss and compare current state-of-the-art frameworks for large scale distributed deep learning. First, we survey current practices in distributed learning and identify the different types of parallelism used. Then, we present empirical results comparing their performance on large image and language training tasks. Additionally, we address their statistical efficiency and memory consumption behavior. Based on our results, we discuss algorithmic and implementation portions of each framework which hinder performance.

【25】 DSBERT:Unsupervised Dialogue Structure learning with BERT 标题:DSBERT:基于BERT的无监督对话结构学习 链接:https://arxiv.org/abs/2111.04933

作者:Bingkun Chen,Shaobing Dai,Shenghua Zheng,Lei Liao,Yang Li 摘要:无监督对话结构学习是自然语言处理中一项重要而有意义的任务。提取的对话结构和过程有助于分析人类对话,并在对话系统的设计和评估中发挥重要作用。传统的对话系统需要专家手动设计对话结构,这非常昂贵。但通过无监督的对话结构学习,可以自动获得对话结构,降低开发人员构建对话过程的成本。学习到的对话结构可以促进下游任务系统的对话生成,提高对话机器人应答的逻辑性和一致性。本文提出了一种基于Bert的无监督对话结构学习算法DSBERT(dialogue structure Bert)。与以往的SOTA模型VRNN和SVRNN不同,我们将BERT和自动编码器相结合,可以有效地结合上下文信息。为了更好地防止模型陷入局部最优解,使对话状态分布更加均匀合理,我们还提出了三个平衡损失函数,可用于对话结构学习。实验结果表明,DSBERT能够生成更接近真实结构的对话结构,能够区分不同语义的句子,并将其映射到不同的隐藏状态。 摘要:Unsupervised dialogue structure learning is an important and meaningful task in natural language processing. The extracted dialogue structure and process can help analyze human dialogue, and play a vital role in the design and evaluation of dialogue systems. The traditional dialogue system requires experts to manually design the dialogue structure, which is very costly. But through unsupervised dialogue structure learning, dialogue structure can be automatically obtained, reducing the cost of developers constructing dialogue process. The learned dialogue structure can be used to promote the dialogue generation of the downstream task system, and improve the logic and consistency of the dialogue robot's reply.In this paper, we propose a Bert-based unsupervised dialogue structure learning algorithm DSBERT (Dialogue Structure BERT). Different from the previous SOTA models VRNN and SVRNN, we combine BERT and AutoEncoder, which can effectively combine context information. In order to better prevent the model from falling into the local optimal solution and make the dialogue state distribution more uniform and reasonable, we also propose three balanced loss functions that can be used for dialogue structure learning. Experimental results show that DSBERT can generate a dialogue structure closer to the real structure, can distinguish sentences with different semantics and map them to different hidden states.

【26】 Building an AI-ready RSE Workforce 标题:建立一支人工智能就绪的RSE员工队伍 链接:https://arxiv.org/abs/2111.04916

作者:Ying Zhang,Matthew A. Gitzendanner,Dan S. Maxwell,Justin W. Richardson,Kaleb E. Smith,Eric A. Stubbs,Brian J. Stucky,Jingchao Zhang,Erik Deumens 机构: University of Florida 备注:3 pages. Research Software Engineers in HPC Workshop (RSE-HPC-2021) at SC21 摘要:人工智能正在改变全球的产业和学术研究,研究软件开发也不例外。机器学习和深度学习被应用于研究软件开发生命周期的各个方面,从新的算法设计范例到软件开发过程。在本文中,我们讨论了我们今天的挑战和机遇,AI的研究软件开发和工程师提出的意见,以及我们的方法,在佛罗里达大学,正在准备我们的劳动力为新时代的人工智能。 摘要:Artificial Intelligence has been transforming industries and academic research across the globe, and research software development is no exception. Machine learning and deep learning are being applied in every aspect of the research software development lifecycles, from new algorithm design paradigms to software development processes. In this paper, we discuss our views on today's challenges and opportunities that AI has presented on research software development and engineers, and the approaches we, at the University of Florida, are taking to prepare our workforce for the new era of AI.

【27】 FPM: A Collection of Large-scale Foundation Pre-trained Language Models 标题:FPM:大型基础预训练语言模型集合 链接:https://arxiv.org/abs/2111.04909

作者:Dezhou Shen 机构:Department of Computer Science, rct ai, Beijing, CN 摘要:最近在语言建模方面的工作表明,训练大规模转换器模型促进了自然语言处理应用程序的最新发展。然而,统一当前有效模型的工作很少。在这项工作中,我们使用当前有效的模型结构,通过当前最主流的技术启动一个模型集。我们认为这将成为未来的基本模式。对于中文,使用GPT-2[9]模型,在中文数据集上训练了103亿个参数的语言模型,特别是训练了基于对话数据的29亿个参数的语言模型;在具有4.95亿个参数的中国数据集上训练BERT模型;Transformer模型在中国数据集上训练了一个具有56亿个参数的语言模型。在英语方面,还开展了相应的训练工作。使用GPT-2模型,在英语数据集上训练了一个有64亿个参数的语言模型;BERT[3]模型在英语数据集上训练了12.4亿个参数的语言模型,特别是在单卡训练技术语言模型的基础上训练了6.88亿个参数;Transformer模型在英语数据集上训练了一个包含56亿个参数的语言模型。在线索[13]评估的TNEWS分类任务中,BERT-C模型的准确率超过了ALBERT xxlarge的59.46%,准确率为59.99%,增加了0.53%。在GLUE[11]评估的QQP分类任务中,78.95%的准确率超过了BERT Large的72.1%,增加了6.85%。与目前ERNIE的准确率相比,在胶水评价中排名第一的为75.2%,提高了3.75%。 摘要:Recent work in language modeling has shown that training large-scale Transformer models has promoted the latest developments in natural language processing applications. However, there is very little work to unify the current effective models. In this work, we use the current effective model structure to launch a model set through the current most mainstream technology. We think this will become the basic model in the future. For Chinese, using the GPT-2[9] model, a 10.3 billion parameter language model was trained on the Chinese dataset, and, in particular, a 2.9 billion parameter language model based on dialogue data was trained; the BERT model was trained on the Chinese dataset with 495 million parameters; the Transformer model has trained a language model with 5.6 billion parameters on the Chinese dataset. In English, corresponding training work has also been done. Using the GPT-2 model, a language model with 6.4 billion parameters was trained on the English dataset; the BERT[3] model trained a language model with 1.24 billion parameters on the English dataset, and in particular, it trained a 688 million parameter based on single card training technology Language model; Transformer model trained a language model with 5.6 billion parameters on the English dataset. In the TNEWS classification task evaluated by CLUE[13], the BERT-C model exceeded the 59.46% accuracy of ALBERT-xxlarge with an accuracy rate of 59.99%, an increase of 0.53%. In the QQP classification task evaluated by GLUE[11], the accuracy rate of 78.95% surpassed the accuracy rate of BERT-Large of 72.1%, an increase of 6.85%. Compared with the current accuracy rate of ERNIE, the first place in the GLUE evaluation of 75.2%, an increase of 3.75%.

【28】 Safe Policy Optimization with Local Generalized Linear Function Approximations 标题:基于局部广义线性函数逼近的安全策略优化 链接:https://arxiv.org/abs/2111.04894

作者:Akifumi Wachi,Yunyue Wei,Yanan Sui 机构:IBM Research, Tsinghua University 备注:18 pages, 6 figures, Accepted to NeurIPS-21 摘要:安全探索是在安全关键系统中应用强化学习(RL)的关键。现有的安全勘探方法是在规律性假设下保证安全的,难以应用于大规模的实际问题。我们提出了一种新的算法SPO-LF,该算法在学习传感器获得的局部可用特征和使用广义线性函数近似的环境奖励/安全之间的关系的同时优化代理的策略。我们为其安全性和最优性提供了理论保证。我们的实验表明,我们的算法1)在样本复杂度和计算成本方面更有效,2)比以前的安全RL方法更适用于大规模问题,并且3)与现有的具有安全约束的高级深度RL方法相比,具有相当的样本效率和安全性。 摘要:Safe exploration is a key to applying reinforcement learning (RL) in safety-critical systems. Existing safe exploration methods guaranteed safety under the assumption of regularity, and it has been difficult to apply them to large-scale real problems. We propose a novel algorithm, SPO-LF, that optimizes an agent's policy while learning the relation between a locally available feature obtained by sensors and environmental reward/safety using generalized linear function approximations. We provide theoretical guarantees on its safety and optimality. We experimentally show that our algorithm is 1) more efficient in terms of sample complexity and computational cost and 2) more applicable to large-scale problems than previous safe RL methods with theoretical guarantees, and 3) comparably sample-efficient and safer compared with existing advanced deep RL methods with safety constraints.

【29】 User Centered Design (VI): Human Factors Approaches for Intelligent Human-Computer Interaction 标题:以用户为中心的设计(VI):智能人机交互的人因方法 链接:https://arxiv.org/abs/2111.04880

作者:Wei Xu 机构: Zhejiang University 备注:in Chinese language 摘要:从“以用户为中心”的设计理念出发,分析了智能人机交互(iHCI)的人因特征,提出了“面向用户的iHCI”的概念。基于联合认知系统、情境感知和智能代理理论,进一步提出了iHCI的人因框架。在新概念和框架的帮助下,本文分析了自主车辆共同驾驶生态系统中的人为因素问题,并规划了未来的研究议程。最后,本文分析了iHCI中两个重要的研究领域(即用户意图识别、人机协作),并指出了未来人因研究的重点。 摘要:Starting from the design philosophy of "user-centered design", this paper analyzes the human factors characteristics of intelligent human-computer interaction (iHCI) and proposes a concept of "user-oriented iHCI". It further proposes a new human factors framework for iHCI based on the theories of joint cognitive systems, situation awareness, and intelligent agents. With the help of the new concept and framework, the paper analyzes the human factors issues in the ecosystem of autonomous vehicle co-driving and layouts future research agenda. Finally, the paper analyzes the two important research areas in iHCI (i.e., user intention recognition, human-computer collaboration) and points out the focus of human factors research in the future.

【30】 EvoLearner: Learning Description Logics with Evolutionary Algorithms 标题:EvoLearner:用进化算法学习描述逻辑 链接:https://arxiv.org/abs/2111.04879

作者:Stefan Heindorf,Lukas Blübaum,Nick Düsterhus,Till Werner,Varun Nandkumar Golani,Caglar Demir,Axel-Cyrille Ngonga Ngomo 备注:11 pages, 3 figures, 9 tables, 3 algorithms 摘要:在知识图中对节点进行分类是一项重要任务,例如,预测缺失的实体类型,预测哪些分子会导致癌症,或者预测哪些药物是有希望的候选治疗药物。虽然黑盒模型通常具有很高的预测性能,但它们只是事后的和局部可解释的,并且不允许使用领域知识轻松丰富所学模型。为此,有人提出从正反两方面学习描述逻辑概念。然而,学习这些概念通常需要很长时间,最先进的方法对文字数据值提供的支持有限,尽管它们对于许多应用程序来说至关重要。在本文中,我们提出了进化学习者-一种学习ALCQ(D)的进化方法,这是一种带有补语(ALC)的定语,与限定的基数限制(Q)和数据属性(D)配对。我们为初始种群提供了一种新的初始化方法:从正示例(知识图中的节点)开始,执行有偏随机游动,并将其转换为描述逻辑概念。此外,在决定在何处拆分数据时,我们通过最大化信息增益来改进对数据属性的支持。我们表明,我们的方法在结构化机器学习的基准框架SML平台上显著优于最新水平。我们的消融研究证实,这是由于我们新颖的初始化方法和对数据属性的支持。 摘要:Classifying nodes in knowledge graphs is an important task, e.g., predicting missing types of entities, predicting which molecules cause cancer, or predicting which drugs are promising treatment candidates. While black-box models often achieve high predictive performance, they are only post-hoc and locally explainable and do not allow the learned model to be easily enriched with domain knowledge. Towards this end, learning description logic concepts from positive and negative examples has been proposed. However, learning such concepts often takes a long time and state-of-the-art approaches provide limited support for literal data values, although they are crucial for many applications. In this paper, we propose EvoLearner - an evolutionary approach to learn ALCQ(D), which is the attributive language with complement (ALC) paired with qualified cardinality restrictions (Q) and data properties (D). We contribute a novel initialization method for the initial population: starting from positive examples (nodes in the knowledge graph), we perform biased random walks and translate them to description logic concepts. Moreover, we improve support for data properties by maximizing information gain when deciding where to split the data. We show that our approach significantly outperforms the state of the art on the benchmarking framework SML-Bench for structured machine learning. Our ablation study confirms that this is due to our novel initialization method and support for data properties.

【31】 An Instance-Dependent Analysis for the Cooperative Multi-Player Multi-Armed Bandit 标题:多人多臂协同劫匪的实例相关分析 链接:https://arxiv.org/abs/2111.04873

作者:Aldo Pacchiano,Peter Bartlett,Michael I. Jordan 机构:Microsoft Research, NYC, University of California, Berkeley 备注:44 pages 摘要:研究了多玩家多武装匪徒的信息共享与合作问题。我们提出了第一个算法,实现对数遗憾这个问题。我们的结果基于两个创新。首先,我们证明了对连续淘汰策略的一个简单修改可以用于允许玩家在没有碰撞的情况下估计其次优差距,直到常数因子。其次,我们利用第一个结果设计了一个通信协议,该协议成功地利用碰撞的小回报在参与者之间进行协调,同时保留有意义的依赖于实例的对数遗憾保证。 摘要:We study the problem of information sharing and cooperation in Multi-Player Multi-Armed bandits. We propose the first algorithm that achieves logarithmic regret for this problem. Our results are based on two innovations. First, we show that a simple modification to a successive elimination strategy can be used to allow the players to estimate their suboptimality gaps, up to constant factors, in the absence of collisions. Second, we leverage the first result to design a communication protocol that successfully uses the small reward of collisions to coordinate among players, while preserving meaningful instance-dependent logarithmic regret guarantees.

【32】 Explaining Face Presentation Attack Detection Using Natural Language 标题:用自然语言解释人脸呈现攻击检测 链接:https://arxiv.org/abs/2111.04862

作者:Hengameh Mirzaalian,Mohamed E. Hussein,Leonidas Spinoulas,Jonathan May,Wael Abd-Almageed 机构:University of Southern California, Information Sciences Institute, Marina del Rey, CA, USA 备注:To Appear in the Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition 2021 摘要:为了解决人脸呈现攻击检测(PAD)这一具有挑战性的问题,人们开发了大量基于深度神经网络的技术。虽然这些技术的重点是提高PAD的分类精度和对未知攻击和环境条件的鲁棒性,但很少关注PAD预测的可解释性。在本文中,我们解决的问题,解释垫预测通过自然语言。我们的方法将PAD模型深层的特征表示传递给语言模型,以生成描述PAD预测背后推理的文本。由于我们的研究中注释数据的数量有限,我们采用了一个轻量级的LSTM网络作为我们的自然语言生成模型。我们研究了不同的损失函数如何影响生成解释的质量,包括常用的单词交叉熵损失、句子辨别性损失和句子语义损失。我们使用来自1105个真实样本和924个呈现攻击样本数据集的人脸图像进行实验。我们的定量和定性结果显示了我们的模型通过文本生成正确的PAD解释的有效性以及句子损失的力量。据我们所知,这是首次引入联合生物特征NLP任务。我们的数据集可以通过我们的GitHub页面获得。 摘要:A large number of deep neural network based techniques have been developed to address the challenging problem of face presentation attack detection (PAD). Whereas such techniques' focus has been on improving PAD performance in terms of classification accuracy and robustness against unseen attacks and environmental conditions, there exists little attention on the explainability of PAD predictions. In this paper, we tackle the problem of explaining PAD predictions through natural language. Our approach passes feature representations of a deep layer of the PAD model to a language model to generate text describing the reasoning behind the PAD prediction. Due to the limited amount of annotated data in our study, we apply a light-weight LSTM network as our natural language generation model. We investigate how the quality of the generated explanations is affected by different loss functions, including the commonly used word-wise cross entropy loss, a sentence discriminative loss, and a sentence semantic loss. We perform our experiments using face images from a dataset consisting of 1,105 bona-fide and 924 presentation attack samples. Our quantitative and qualitative results show the effectiveness of our model for generating proper PAD explanations through text as well as the power of the sentence-wise losses. To the best of our knowledge, this is the first introduction of a joint biometrics-NLP task. Our dataset can be obtained through our GitHub page.

【33】 Hybrid BYOL-ViT: Efficient approach to deal with small Datasets 标题:混合BYOL-VIT:处理小数据集的有效方法 链接:https://arxiv.org/abs/2111.04845

作者:Safwen Naimi,Rien van Leeuwen,Wided Souidene,Slim Ben Saoud 备注:19 pages, 8 figures and 16 tables 摘要:监督学习可以学习较大的表征空间,这对于处理困难的学习任务至关重要。然而,由于模型的设计,经典的图像分类方法在处理小数据集时难以推广到新问题和新情况。事实上,监督学习可能会丢失图像特征的位置,从而导致监督在非常深的体系结构中崩溃。在本文中,我们研究了自我监督如何在不需要数百万标记数据的情况下有效地训练神经网络的第一层,甚至比监督学习更好。其主要目标是通过获取与任务无关的通用底层特征来断开像素数据与注释的连接。此外,我们还研究了视觉变换器(ViT)并表明,从自监督体系结构中提取的低层特征可以提高这种新兴体系结构的鲁棒性和整体性能。我们在一个最小的开源数据集STL-10上评估了我们的方法,当将自监督学习体系结构中的低级特征输入到ViT而不是原始图像时,我们获得了从41.66%到83.25%的性能显著提升。 摘要:Supervised learning can learn large representational spaces, which are crucial for handling difficult learning tasks. However, due to the design of the model, classical image classification approaches struggle to generalize to new problems and new situations when dealing with small datasets. In fact, supervised learning can lose the location of image features which leads to supervision collapse in very deep architectures. In this paper, we investigate how self-supervision with strong and sufficient augmentation of unlabeled data can train effectively the first layers of a neural network even better than supervised learning, with no need for millions of labeled data. The main goal is to disconnect pixel data from annotation by getting generic task-agnostic low-level features. Furthermore, we look into Vision Transformers (ViT) and show that the low-level features derived from a self-supervised architecture can improve the robustness and the overall performance of this emergent architecture. We evaluated our method on one of the smallest open-source datasets STL-10 and we obtained a significant boost of performance from 41.66% to 83.25% when inputting low-level features from a self-supervised learning architecture to the ViT instead of the raw images.

【34】 Efficient estimates of optimal transport via low-dimensional embeddings 标题:基于低维嵌入的最优传输的有效估计 链接:https://arxiv.org/abs/2111.04838

作者:Patric M. Fulop,Vincent Danos 机构: PatricSchool of InformaticsUniversity of EdinburghEH8 9AB 备注:Neurips 2021 Optimal Transport and Machine Learning Workshop 摘要:最优传输距离(OT)作为比较概率分布的一种方法,在机器学习领域得到了广泛的应用。当数据生活在高维环境中时,计算这些数据的成本很高。Paty等人于2019年开展的最新工作旨在通过使用数据的低阶预测(被视为离散度量)计算OT来降低成本。我们扩展了这一方法,并表明,如果地图族是1-Lipschitz,则可以使用更一般的地图族来近似OT距离。最佳估计是通过在给定的族上最大化OT来获得的。由于OT计算是在将数据映射到低维空间后进行的,因此我们的方法可以很好地与原始数据维进行缩放。我们用神经网络来演示这个想法。 摘要:Optimal transport distances (OT) have been widely used in recent work in Machine Learning as ways to compare probability distributions. These are costly to compute when the data lives in high dimension. Recent work by Paty et al., 2019, aims specifically at reducing this cost by computing OT using low-rank projections of the data (seen as discrete measures). We extend this approach and show that one can approximate OT distances by using more general families of maps provided they are 1-Lipschitz. The best estimate is obtained by maximising OT over the given family. As OT calculations are done after mapping data to a lower dimensional space, our method scales well with the original data dimension. We demonstrate the idea with neural networks.

【35】 Solving Marginal MAP Exactly by Probabilistic Circuit Transformations 标题:用概率电路变换精确求解边际映射 链接:https://arxiv.org/abs/2111.04833

作者:YooJung Choi,Tal Friedman,Guy Van den Broeck 机构:Computer Science Department, University of California, Los Angeles 摘要:概率电路(PC)是一类易于处理的概率模型,它允许对边缘和最可能解释(MPE)等查询进行有效的、通常是线性时间的推理。然而,边缘图是许多决策问题的核心,除非它们满足高度限制性的结构约束,否则对于PC来说仍然是一个很难查询的问题。在本文中,我们开发了一种剪枝算法,该算法删除与边缘地图查询无关的PC部分,在保持正确解的同时缩小PC。这种修剪技术非常有效,因此我们能够仅基于迭代变换电路构建边缘贴图解算器——无需搜索。我们在实际数据集上以经验证明了我们的方法的有效性。 摘要:Probabilistic circuits (PCs) are a class of tractable probabilistic models that allow efficient, often linear-time, inference of queries such as marginals and most probable explanations (MPE). However, marginal MAP, which is central to many decision-making problems, remains a hard query for PCs unless they satisfy highly restrictive structural constraints. In this paper, we develop a pruning algorithm that removes parts of the PC that are irrelevant to a marginal MAP query, shrinking the PC while maintaining the correct solution. This pruning technique is so effective that we are able to build a marginal MAP solver based solely on iteratively transforming the circuit -- no search is required. We empirically demonstrate the efficacy of our approach on real-world datasets.

【36】 Deep Learning Approach for Aggressive Driving Behaviour Detection 标题:基于深度学习的攻击性驾驶行为检测方法 链接:https://arxiv.org/abs/2111.04794

作者:Farid Talebloo,Emad A. Mohammed,Behrouz Far 机构:Department of Electrical and Software Engineering, University of Calgary, University Drive NW, Calgary, Alberta T,N ,N, CANADA 摘要:驾驶行为是道路碰撞和事故的主要原因之一,可以通过识别和尽量减少攻击性驾驶行为来减少这些原因。这项研究确定了驾驶员在不同情况下(匆忙、心理冲突、报复)开始积极驾驶的时间步长。需要观察者(真实或虚拟)检查驾驶行为,以发现攻击性驾驶情况;我们通过使用智能手机的GPS传感器每三分钟检测一次位置并对驾驶员的驾驶行为进行分类,从而克服了这个问题。为了检测数据集中的时间序列模式,我们使用RNN(GRU,LSTM)算法在驾驶过程中识别模式。该算法与道路、车辆、位置或驾驶员特征无关。我们得出结论,三分钟(或更长时间)的驾驶(120秒的GPS数据)足以识别驾驶员行为。结果显示了较高的准确性和较高的F1分数。 摘要:Driving behaviour is one of the primary causes of road crashes and accidents, and these can be decreased by identifying and minimizing aggressive driving behaviour. This study identifies the timesteps when a driver in different circumstances (rush, mental conflicts, reprisal) begins to drive aggressively. An observer (real or virtual) is needed to examine driving behaviour to discover aggressive driving occasions; we overcome this problem by using a smartphone's GPS sensor to detect locations and classify drivers' driving behaviour every three minutes. To detect timeseries patterns in our dataset, we employ RNN (GRU, LSTM) algorithms to identify patterns during the driving course. The algorithm is independent of road, vehicle, position, or driver characteristics. We conclude that three minutes (or more) of driving (120 seconds of GPS data) is sufficient to identify driver behaviour. The results show high accuracy and a high F1 score.

【37】 Visual Question Answering based on Formal Logic 标题:基于形式逻辑的可视化问答 链接:https://arxiv.org/abs/2111.04785

作者:Muralikrishnna G. Sethuraman,Ali Payani,Faramarz Fekri,J. Clayton Kerce 机构:School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, USA, Cisco, California, USA, Georgia Tech Research Institute 摘要:近年来,由于在理解来自多种模式(即图像、语言)的信息方面所面临的挑战,视觉问答(VQA)在机器学习领域得到了广泛的关注。在VQA中,基于一组图像提出一系列问题,手头的任务是找到答案。为了实现这一点,我们使用形式逻辑框架,采用基于符号推理的方法。图像和问题被转换成符号表示,并在此基础上进行显式推理。我们提出了一个正式的逻辑框架,其中(i)借助场景图将图像转换为逻辑背景事实,(ii)使用基于转换器的深度学习模型将问题转换为一阶谓词逻辑子句,以及(iii)执行可满足性检查,利用背景知识和谓语从句的基础知识,得出答案。我们提出的方法具有高度的可解释性,管道中的每一步都可以很容易地由人来分析。我们在CLEVR和GQA数据集上验证了我们的方法。我们在CLEVR数据集上实现了接近完美的99.6%的准确率,与最先进的模型相当,这表明形式逻辑是解决视觉问答问题的可行工具。我们的模型数据效率也很高,当仅使用10%的训练数据进行训练时,CLEVR数据集的准确率达到99.1%。 摘要:Visual question answering (VQA) has been gaining a lot of traction in the machine learning community in the recent years due to the challenges posed in understanding information coming from multiple modalities (i.e., images, language). In VQA, a series of questions are posed based on a set of images and the task at hand is to arrive at the answer. To achieve this, we take a symbolic reasoning based approach using the framework of formal logic. The image and the questions are converted into symbolic representations on which explicit reasoning is performed. We propose a formal logic framework where (i) images are converted to logical background facts with the help of scene graphs, (ii) the questions are translated to first-order predicate logic clauses using a transformer based deep learning model, and (iii) perform satisfiability checks, by using the background knowledge and the grounding of predicate clauses, to obtain the answer. Our proposed method is highly interpretable and each step in the pipeline can be easily analyzed by a human. We validate our approach on the CLEVR and the GQA dataset. We achieve near perfect accuracy of 99.6% on the CLEVR dataset comparable to the state of art models, showcasing that formal logic is a viable tool to tackle visual question answering. Our model is also data efficient, achieving 99.1% accuracy on CLEVR dataset when trained on just 10% of the training data.

【38】 ML-EXray: Visibility into ML Deployment on the Edge 标题:ML-EXRAY:对边缘上的ML部署的可见性 链接:https://arxiv.org/abs/2111.04779

作者:Hang Qiu,Ioanna Vavelidou,Jian Li,Evgenya Pergament,Pete Warden,Sandeep Chinchali,Zain Asgar,Sachin Katti 摘要:得益于不断扩展的云基础设施,深度神经网络(DNN)如今在云中训练时具有越来越高的性能。研究人员花费数月的努力争取模型准确度的额外几个百分点。然而,实际上,当这些模型部署在边缘设备上时,通常性能会突然下降10%以上,而没有明显的原因。关键的挑战是,在边缘设备上执行ML推理的可见性不高,并且在边缘部署过程中很少意识到潜在的问题。我们介绍了MLExray,一个端到端框架,它提供了对ML执行的层级细节的可见性,并帮助开发人员分析和调试云到边缘部署问题。通常情况下,边缘性能次优的原因不仅在于模型本身,还在于整个数据流和部署过程中的每个操作。评估表明,ML-EXray可以有效捕获部署问题,如预处理错误、量化问题、次优内核等。使用ML-EXray,用户需要编写不到15行代码才能全面检查边缘部署管道。为了消除这些问题,ML EXray可以将模型性能校正高达30%,精确定位容易出错的层,并指导用户将内核执行延迟优化两个数量级。代码和API将作为开源多语言工具库和Python部署验证库发布。 摘要:Benefiting from expanding cloud infrastructure, deep neural networks (DNNs) today have increasingly high performance when trained in the cloud. Researchers spend months of effort competing for an extra few percentage points of model accuracy. However, when these models are actually deployed on edge devices in practice, very often, the performance can abruptly drop over 10% without obvious reasons. The key challenge is that there is not much visibility into ML inference execution on edge devices, and very little awareness of potential issues during the edge deployment process. We present ML-EXray, an end-to-end framework, which provides visibility into layer-level details of the ML execution, and helps developers analyze and debug cloud-to-edge deployment issues. More often than not, the reason for sub-optimal edge performance does not only lie in the model itself, but every operation throughout the data flow and the deployment process. Evaluations show that ML-EXray can effectively catch deployment issues, such as pre-processing bugs, quantization issues, suboptimal kernels, etc. Using ML-EXray, users need to write less than 15 lines of code to fully examine the edge deployment pipeline. Eradicating these issues, ML-EXray can correct model performance by up to 30%, pinpoint error-prone layers, and guide users to optimize kernel execution latency by two orders of magnitude. Code and APIs will be released as an open-source multi-lingual instrumentation library and a Python deployment validation library.

【39】 Use of 1D-CNN for input data size reduction of LSTM in Hourly Rainfall-Runoff modeling 标题:1D-CNN在逐时降雨径流模拟LSTM输入数据降维中的应用 链接:https://arxiv.org/abs/2111.04732

作者:Kei Ishida,Ali Ercan,Takeyoshi Nagasato,Masato Kiyama,Motoki Amagasaki 机构:International Research Organization for Advanced Science and Technology, Kumamoto University,-,-, Kurokami, Center for Water Cycle, Marine Environment, and Disaster Management, Kumamoto University,-,-, Kurokami 备注:18 pages, 9 figures 摘要:本研究提出了一种由一维卷积神经网络(1D-CNN)和长短时记忆(LSTM)网络(简称CNNsLSTM)串联耦合而成的时尺度降雨径流模拟体系结构。在CNNsLTSM中,CNN组件长时间接收每小时气象时间序列数据,然后LSTM组件短时间接收从1D-CNN提取的特征和每小时气象时间序列数据。作为一个案例研究,CNNsLSTM被用于日本石井河流域的小时降雨径流模拟。气象数据集包括降水量、气温、蒸散量、长波和短波辐射,作为输入,河流流量作为目标数据。为了评估提议的CNNsLSTM的性能,将CNNsLSTM的结果与1D-CNN、仅每小时输入的LSTM(LSTMWHORH)、1D-CNN和LSTM的并行架构(CNNpLSTM)以及同时使用每日和每小时输入数据的LSTM架构(LSTMwDpH)进行比较。与三种传统架构(1D-CNN、LSTMWHOOR和CNNpLSTM)和最近提出的LSTMwDpH相比,CNNsLSTM在估计精度方面有明显的改进。与观测到的流量相比,1D-CNN试验期间NSE值的中值为0.455-0.469(基于NCHF=8、16和32,CNN第一层特征图的通道数),CNNpLSTM试验期间NSE值的中值为0.639-0.656(基于NCHF=8、16和32),LSTMwHour试验期间NSE值的中值为0.745,LSTMwDpH试验期间NSE值的中值为0.831,CNNSM试验期间NSM试验期间NSE值的中值为0.865-0.873(基于NCHF=8、16和32)。此外,建议的CNNsLSTM将1D-CNN的RMSE中位数降低了50.2%-51.4%,CNNpLSTM降低了37.4%-40.8%,LSTMwHour降低了27.3%-29.5%,LSTMwDpH降低了10.6%-13.4%。 摘要:An architecture consisting of a serial coupling of the one-dimensional convolutional neural network (1D-CNN) and the long short-term memory (LSTM) network, which is referred as CNNsLSTM, was proposed for hourly-scale rainfall-runoff modeling in this study. In CNNsLTSM, the CNN component receives the hourly meteorological time series data for a long duration, and then the LSTM component receives the extracted features from 1D-CNN and the hourly meteorological time series data for a short-duration. As a case study, CNNsLSTM was implemented for hourly rainfall-runoff modeling at the Ishikari River watershed, Japan. The meteorological dataset, consists of precipitation, air temperature, evapotranspiration, and long- and short-wave radiation, were utilized as input, and the river flow was used as the target data. To evaluate the performance of proposed CNNsLSTM, results of CNNsLSTM were compared with those of 1D-CNN, LSTM only with hourly inputs (LSTMwHour), parallel architecture of 1D-CNN and LSTM (CNNpLSTM), and the LSTM architecture which uses both daily and hourly input data (LSTMwDpH). CNNsLSTM showed clear improvements on the estimation accuracy compared to the three conventional architectures (1D-CNN, LSTMwHour, and CNNpLSTM), and recently proposed LSTMwDpH. In comparison to observed flows, the median of the NSE values for the test period are 0.455-0.469 for 1D-CNN (based on NCHF=8, 16, and 32, the numbers of the channels of the feature map of the first layer of CNN), 0.639-0.656 for CNNpLSTM (based on NCHF=8, 16, and 32), 0.745 for LSTMwHour, 0.831 for LSTMwDpH, and 0.865-0.873 for CNNsLSTM (based on NCHF=8, 16, and 32). Furthermore, the proposed CNNsLSTM reduces the median RMSE of 1D-CNN by 50.2%-51.4%, CNNpLSTM by 37.4%-40.8%, LSTMwHour by 27.3%-29.5%, and LSTMwDpH by 10.6%-13.4%.

【40】 Survey of Deep Learning Methods for Inverse Problems 标题:反问题的深度学习方法综述 链接:https://arxiv.org/abs/2111.04731

作者:Shima Kamyab,Zihreh Azimifar,Rasool Sabzi,Paul Fieguth 机构:Dept. of Comp. Sci. and Eng., Shiraz University, Shiraz, Iran, Zohreh Azimifar, Dept. of Systems Design Engineering, University of Waterloo, Waterloo, Canada, arXiv:,.,v, [cs.CV] , Nov 摘要:在本文中,我们研究了解决反问题的各种深度学习策略。我们将现有的反问题深度学习解决方案分为三类:直接映射、数据一致性优化器和深度正则化器。我们选择了每种反问题类型的样本,以比较三种方法的鲁棒性我们对线性回归的经典问题和计算机视觉中的三个著名反问题进行了广泛的实验,即图像去噪、3D人脸逆绘制和对象跟踪,这些问题被选为每类反问题的代表性原型ems。总体结果和统计分析表明,解决方案类别具有鲁棒性行为,这取决于反问题域的类型,具体取决于问题是否包含测量异常值。根据我们的实验结果,我们提出了r每个反问题类。 摘要:In this paper we investigate a variety of deep learning strategies for solving inverse problems. We classify existing deep learning solutions for inverse problems into three categories of Direct Mapping, Data Consistency Optimizer, and Deep Regularizer. We choose a sample of each inverse problem type, so as to compare the robustness of the three categories, and report a statistical analysis of their differences. We perform extensive experiments on the classic problem of linear regression and three well-known inverse problems in computer vision, namely image denoising, 3D human face inverse rendering, and object tracking, selected as representative prototypes for each class of inverse problems. The overall results and the statistical analyses show that the solution categories have a robustness behaviour dependent on the type of inverse problem domain, and specifically dependent on whether or not the problem includes measurement outliers. Based on our experimental results, we conclude by proposing the most robust solution category for each inverse problem class.

【41】 A Deep Learning Technique using Low Sampling rate for residential Non Intrusive Load Monitoring 标题:一种用于住宅非侵入式负荷监测的低采样率深度学习技术 链接:https://arxiv.org/abs/2111.05120

作者:Ronak Aghera,Sahil Chilana,Vishal Garg,Raghunath Reddy 机构:International Institute of Information Technology, Hyderabad 摘要:单个设备负载和能耗反馈是追求用户在住宅中节能的重要途径之一。这有助于识别故障设备和设备闲置时浪费的能源。主要的挑战是在每个设备上没有入侵传感器的情况下识别和估计单个设备的能耗。非侵入性负荷监测(NILM)或能量分解是一个盲源分离问题,需要一个系统根据家庭总能耗估算单个电器的用电量。在本文中,我们提出了一种新的基于深度神经网络的方法,用于对从居民家庭获得的低频电力数据进行负荷分解。我们将一系列一维卷积神经网络和长-短期记忆(1D CNN-LSTM)相结合,以提取能够识别活动电器的特征,并在给定家庭总功率值的情况下检索其功耗。我们使用CNN从给定时间范围内的主要读数中提取特征,然后使用这些特征对给定设备在该时间段是否处于活动状态进行分类。然后,将提取的特征用于使用LSTM建模生成问题。我们训练LSTM生成特定设备的分解能耗。我们的神经网络能够生成需求侧的详细反馈,为最终用户提供有关其电力消耗的重要见解。该算法设计用于低功耗离线设备,如ESP32。经验计算表明,我们的模型在参考能量分解数据集(REDD)上的性能优于最新的模型。 摘要:Individual device loads and energy consumption feedback is one of the important approaches for pursuing users to save energy in residences. This can help in identifying faulty devices and wasted energy by devices when left On unused. The main challenge is to identity and estimate the energy consumption of individual devices without intrusive sensors on each device. Non-intrusive load monitoring (NILM) or energy disaggregation, is a blind source separation problem which requires a system to estimate the electricity usage of individual appliances from the aggregated household energy consumption. In this paper, we propose a novel deep neural network-based approach for performing load disaggregation on low frequency power data obtained from residential households. We combine a series of one-dimensional Convolutional Neural Networks and Long Short Term Memory (1D CNN-LSTM) to extract features that can identify active appliances and retrieve their power consumption given the aggregated household power value. We used CNNs to extract features from main readings in a given time frame and then used those features to classify if a given appliance is active at that time period or not. Following that, the extracted features are used to model a generation problem using LSTM. We train the LSTM to generate the disaggregated energy consumption of a particular appliance. Our neural network is capable of generating detailed feedback of demand-side, providing vital insights to the end-user about their electricity consumption. The algorithm was designed for low power offline devices such as ESP32. Empirical calculations show that our model outperforms the state-of-the-art on the Reference Energy Disaggregation Dataset (REDD).

【42】 GDCA: GAN-based single image super resolution with Dual discriminators and Channel Attention 标题:GDCA:具有双鉴别器和通道注意的GaN基单图像超分辨率 链接:https://arxiv.org/abs/2111.05014

作者:Thanh Nguyen,Hieu Hoang,Chang D. Yoo 机构: Korea Advances Institute of Science and Technology (KAIST) 备注:None 摘要:单图像超分辨率(SISR)是一个非常活跃的研究领域。本文通过使用基于GAN的双鉴别器方法并将其与注意机制相结合来解决SISR。实验结果表明,与其他传统方法相比,GDCA可以生成更清晰、更令人满意的图像。 摘要:Single Image Super-Resolution (SISR) is a very active research field. This paper addresses SISR by using a GAN-based approach with dual discriminators and incorporating it with an attention mechanism. The experimental results show that GDCA can generate sharper and high pleasing images compare to other conventional methods.

【43】 Solving PDE-constrained Control Problems using Operator Learning 标题:利用算子学习求解偏微分方程约束控制问题 链接:https://arxiv.org/abs/2111.04941

作者:Rakhoon Hwang,Jae Yong Lee,Jin Young Shin,Hyung Ju Hwang 机构:Department of Mathematics, Pohang University of Science and Technology, Cheongam-ro , Pohang , Republic of Korea 备注:15 pages, 12 figures. This paper is under review of AAAI 2022 摘要:复杂物理动力学的建模和控制在现实问题中至关重要。通过引入具有特殊正则化子的偏微分方程解算子的代理模型,我们提出了一个新的框架,该框架通常适用于求解偏微分方程约束的最优控制问题。该框架的过程分为两个阶段:PDE约束的解算子学习(第一阶段)和最优控制搜索(第二阶段)。一旦代理模型在第一阶段得到训练,就可以在第二阶段推断出最优控制,而无需进行大量计算。我们的框架可以应用于数据驱动和无数据情况。我们证明了我们的方法成功地应用于从泊松方程到Burgers方程的具有不同PDE约束的不同控制变量的各种最优控制问题。 摘要:The modeling and control of complex physical dynamics are essential in real-world problems. We propose a novel framework that is generally applicable to solving PDE-constrained optimal control problems by introducing surrogate models for PDE solution operators with special regularizers. The procedure of the proposed framework is divided into two phases: solution operator learning for PDE constraints (Phase 1) and searching for optimal control (Phase 2). Once the surrogate model is trained in Phase 1, the optimal control can be inferred in Phase 2 without intensive computations. Our framework can be applied to both data-driven and data-free cases. We demonstrate the successful application of our method to various optimal control problems for different control variables with diverse PDE constraints from the Poisson equation to Burgers' equation.

【44】 Lymph Node Detection in T2 MRI with Transformers 标题:TransformerT2磁共振成像中的淋巴结检测 链接:https://arxiv.org/abs/2111.04885

作者:Tejas Sudharshan Mathai,Sungwon Lee,Daniel C. Elton,Thomas C. Shen,Yifan Peng,Zhiyong Lu,Ronald M. Summers 机构:Imaging Biomarkers and Computer-Aided Diagnosis Laboratory, Radiology and Imaging, Sciences, Clinical Center, National Institutes of Health, Bethesda MD, USA, National Center for Biotechnology Information, National Library of Medicine, National 备注:Accepted at SPIE 2022 摘要:在T2磁共振成像(MRI)中识别淋巴结(LN)是放射科医生在淋巴增生性疾病评估中的一个重要步骤。淋巴结的大小在其分期中起着至关重要的作用,放射科医生有时使用额外的对比序列,如弥散加权成像(DWI)进行确认。然而,在T2 MRI扫描中,淋巴结有不同的表现,因此很难对转移进行分期。此外,放射科医生在忙碌的一天中往往会错过较小的转移淋巴结。为了解决这些问题,我们建议使用检测Transformer(DETR)网络来定位可疑转移淋巴结,以便在不同扫描仪和检查方案获得的挑战性T2 MRI扫描中进行分期。通过边界盒融合技术减少了假阳性(FP),在每幅图像4FP时,精确度为65.41%,灵敏度为91.66%。据我们所知,我们的结果改进了T2 MRI扫描中淋巴结检测的最新技术。 摘要:Identification of lymph nodes (LN) in T2 Magnetic Resonance Imaging (MRI) is an important step performed by radiologists during the assessment of lymphoproliferative diseases. The size of the nodes play a crucial role in their staging, and radiologists sometimes use an additional contrast sequence such as diffusion weighted imaging (DWI) for confirmation. However, lymph nodes have diverse appearances in T2 MRI scans, making it tough to stage for metastasis. Furthermore, radiologists often miss smaller metastatic lymph nodes over the course of a busy day. To deal with these issues, we propose to use the DEtection TRansformer (DETR) network to localize suspicious metastatic lymph nodes for staging in challenging T2 MRI scans acquired by different scanners and exam protocols. False positives (FP) were reduced through a bounding box fusion technique, and a precision of 65.41\% and sensitivity of 91.66\% at 4 FP per image was achieved. To the best of our knowledge, our results improve upon the current state-of-the-art for lymph node detection in T2 MRI scans.

【45】 BRACS: A Dataset for BReAst Carcinoma Subtyping in H&E Histology Images 标题:BRACS:一种用于乳腺癌HE组织学图像分型的数据集 链接:https://arxiv.org/abs/2111.04740

作者:Nadia Brancati,Anna Maria Anniciello,Pushpak Pati,Daniel Riccio,Giosuè Scognamiglio,Guillaume Jaume,Giuseppe De Pietro,Maurizio Di Bonito,Antonio Foncubierta,Gerardo Botti,Maria Gabrani,Florinda Feroce,Maria Frucci 机构:Institute for High Performance Computing and Networking of the Research Council of Italy, ICAR-CNR, Naples, National Cancer Institute – IRCCS – Fondazione Pascale, Naples, Italy, IBM Research – Zurich, Switzerland 备注:10 pages, 3 figures, 8 tables, 30 references 摘要:乳腺癌是最常见的癌症,也是女性癌症死亡人数最多的癌症。诊断活动的最新进展与大规模筛查政策相结合,显著降低了乳腺癌患者的死亡率。然而,病理学家对组织切片的手动检查既繁琐又耗时,且受观察者之间和内部显著差异的影响。最近,全玻片扫描系统的出现使病理玻片的快速数字化成为可能,并使之能够开发数字工作流程。这些进步进一步使得利用人工智能(AI)来辅助、自动化和增强病理诊断成为可能。但是人工智能技术,特别是深度学习(Deep Learning,DL),需要大量高质量的注释数据来学习。构建这种特定于任务的数据集带来了一些挑战,例如数据采集级别限制、耗时且昂贵的注释以及私有信息的匿名化。在这篇文章中,我们介绍了乳腺癌亚型(BRACS)数据集,这是一个大队列的注释苏木精和伊红(H&E)染色图像,有助于乳腺病变的表征。BRAS包含547张完整幻灯片图像(WSI),以及从WSIs中提取的4539个感兴趣区域(ROI)。每个WSI和各自的ROI由三名委员会认证病理学家按照不同的病变类别进行注释。具体而言,BRAC包括三种病变类型,即良性、恶性和非典型,这些病变又分为七类。据我们所知,它是WSI和ROI水平上最大的乳腺癌亚型注释数据集。此外,通过包括未研究的非典型病变,BRAC为利用AI更好地了解其特征提供了独特的机会。 摘要:Breast cancer is the most commonly diagnosed cancer and registers the highest number of deaths for women with cancer. Recent advancements in diagnostic activities combined with large-scale screening policies have significantly lowered the mortality rates for breast cancer patients. However, the manual inspection of tissue slides by the pathologists is cumbersome, time-consuming, and is subject to significant inter- and intra-observer variability. Recently, the advent of whole-slide scanning systems have empowered the rapid digitization of pathology slides, and enabled to develop digital workflows. These advances further enable to leverage Artificial Intelligence (AI) to assist, automate, and augment pathological diagnosis. But the AI techniques, especially Deep Learning (DL), require a large amount of high-quality annotated data to learn from. Constructing such task-specific datasets poses several challenges, such as, data-acquisition level constrains, time-consuming and expensive annotations, and anonymization of private information. In this paper, we introduce the BReAst Carcinoma Subtyping (BRACS) dataset, a large cohort of annotated Hematoxylin & Eosin (H&E)-stained images to facilitate the characterization of breast lesions. BRACS contains 547 Whole-Slide Images (WSIs), and 4539 Regions of Interest (ROIs) extracted from the WSIs. Each WSI, and respective ROIs, are annotated by the consensus of three board-certified pathologists into different lesion categories. Specifically, BRACS includes three lesion types, i.e., benign, malignant and atypical, which are further subtyped into seven categories. It is, to the best of our knowledge, the largest annotated dataset for breast cancer subtyping both at WSI- and ROI-level. Further, by including the understudied atypical lesions, BRACS offers an unique opportunity for leveraging AI to better understand their characteristics.

【46】 Mixed Transformer U-Net For Medical Image Segmentation 标题:用于医学图像分割的混合TransformerU网 链接:https://arxiv.org/abs/2111.04734

作者:Hongyi Wang,Shiao Xie,Lanfen Lin,Yutaro Iwamoto,Xian-Hua Han,Yen-Wei Chen,Ruofeng Tong 机构:College of Computer Science and Technology, Zhejiang University, China, College of Information Science and Engineering, Ritsumeikan University, Japan, Artificial Intelligence Research Center, Yamaguchi University, Japan 摘要:尽管U-Net在医学图像分割任务中取得了巨大的成功,但它缺乏明确建模长期依赖关系的能力。因此,视觉变换器由于其通过自我注意(SA)捕捉长距离相关性的固有能力,近年来已成为一种可供选择的分割结构。然而,Transformer通常依赖于大规模的预训练,并且具有很高的计算复杂度。此外,SA只能在单个样本中建模自相似性,而忽略整个数据集的潜在相关性。为了解决这些问题,我们提出了一种新的Transformer模块,名为混合Transformer模块(MTM),用于同时进行内部和内部亲和力学习。MTM首先通过我们精心设计的局部全局高斯加权自我注意(LGG-SA)有效地计算自我亲和力。然后,通过外部注意(EA)挖掘数据样本之间的相互连接。通过使用MTM,我们构建了一个名为混合TransformerU网络(MT UNet)的U形模型,用于精确的医学图像分割。我们在两个不同的公共数据集上测试了我们的方法,实验结果表明,该方法比其他最先进的方法具有更好的性能。该代码可从以下网址获取:https://github.com/Dootmaan/MT-UNet. 摘要:Though U-Net has achieved tremendous success in medical image segmentation tasks, it lacks the ability to explicitly model long-range dependencies. Therefore, Vision Transformers have emerged as alternative segmentation structures recently, for their innate ability of capturing long-range correlations through Self-Attention (SA). However, Transformers usually rely on large-scale pre-training and have high computational complexity. Furthermore, SA can only model self-affinities within a single sample, ignoring the potential correlations of the overall dataset. To address these problems, we propose a novel Transformer module named Mixed Transformer Module (MTM) for simultaneous inter- and intra- affinities learning. MTM first calculates self-affinities efficiently through our well-designed Local-Global Gaussian-Weighted Self-Attention (LGG-SA). Then, it mines inter-connections between data samples through External Attention (EA). By using MTM, we construct a U-shaped model named Mixed Transformer U-Net (MT-UNet) for accurate medical image segmentation. We test our method on two different public datasets, and the experimental results show that the proposed method achieves better performance over other state-of-the-art methods. The code is available at: https://github.com/Dootmaan/MT-UNet.

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-11-10,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
腾讯智能对话平台
腾讯智能对话平台(Tencent Bot Platform,TBP)专注于“对话即服务”的愿景,全面开放腾讯对话系统核心技术,为大型企业客户、开发者和生态合作伙伴提供开发平台和机器人中间件能力,实现便捷、低成本构建人机对话体验和高效、多样化赋能行业。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档