前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >人工智能学术速递[6.23]

人工智能学术速递[6.23]

作者头像
公众号-arXiv每日学术速递
发布2021-07-02 18:23:32
1.1K0
发布2021-07-02 18:23:32
举报

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.AI人工智能,共计56篇

【1】 Tracking Instances as Queries 标题:将实例作为查询进行跟踪

作者:Shusheng Yang,Yuxin Fang,Xinggang Wang,Yu Li,Ying Shan,Bin Feng,Wenyu Liu 机构:School of EIC, Huazhong University of Science & Technology, Applied Research Center (ARC), Tencent PCG 备注:None 链接:https://arxiv.org/abs/2106.11963 摘要:近年来,基于查询的深度网络由于其端到端的流水线结构和在目标检测、语义分割、实例分割等基本计算机视觉任务中的竞争优势而受到广泛关注。然而,如何建立一个架构优雅、性能强大的基于查询的视频实例分割(VIS)框架还有待解决。在本文中,我们提出了一个基于查询的VIS框架,它充分利用了queryist中实例和查询之间的一对一对应关系。该方法在YouTube-VIS-2019/2021数据集上获得52.7/52.3 AP,在CVPR 2021\textbf{的YouTube-VIS挑战赛中获得第二名,采用单一在线端到端模型、单规模测试和适量训练数据}。我们还在YouTube-VIS-2021数据集上提供QueryTrack-ResNet-50基线结果,作为VIS社区的参考。 摘要:Recently, query based deep networks catch lots of attention owing to their end-to-end pipeline and competitive results on several fundamental computer vision tasks, such as object detection, semantic segmentation, and instance segmentation. However, how to establish a query based video instance segmentation (VIS) framework with elegant architecture and strong performance remains to be settled. In this paper, we present \textbf{QueryTrack} (i.e., tracking instances as queries), a unified query based VIS framework fully leveraging the intrinsic one-to-one correspondence between instances and queries in QueryInst. The proposed method obtains 52.7 / 52.3 AP on YouTube-VIS-2019 / 2021 datasets, which wins the 2-nd place in the YouTube-VIS Challenge at CVPR 2021 \textbf{with a single online end-to-end model, single scale testing \& modest amount of training data}. We also provide QueryTrack-ResNet-50 baseline results on YouTube-VIS-2021 dataset as references for the VIS community.

【2】 Physics-Informed Deep Reversible Regression Model for Temperature Field Reconstruction of Heat-Source Systems 标题:热源系统温度场重构的物理信息深度可逆回归模型

作者:Zhiqiang Gong,Weien Zhou,Jun Zhang,Wei Peng,Wen Yao 机构: Yao are with the NationalInnovation Institute of Defense Technology 链接:https://arxiv.org/abs/2106.11929 摘要:在工程系统中,为了保证热源的正常工作乃至长寿命,对热源部件寿命期内的温度进行监测是必不可少的。然而,以往的方法主要采用插值估计,需要大量的温度张量才能得到精确的估计。为了解决这一问题,本文提出了一种新的基于物理信息的温度场重构深度代理模型。首先,定义了热源系统的温度场重建任务。然后,本文针对所提出的任务开发了深度代理模型映射。最后,考虑到传热的物理性质,本文提出了四种不同的损失,并结合这些损失建立了深层替代模型。对典型的二维热源系统进行了实验研究,验证了所提出的基于物理信息的深度替代模型重建温度场的有效性和有效性。 摘要:Temperature monitoring during the life time of heat-source components in engineering systems becomes essential to ensure the normal work and even the long working life of the heat sources. However, prior methods, which mainly use the interpolate estimation, require large amounts of temperature tensors for an accurate estimation. To solve this problem, this work develops a novel physics-informed deep surrogate models for temperature field reconstruction. First, we defines the temperature field reconstruction task of heat-source systems. Then, this work develops the deep surrogate model mapping for the proposed task. Finally, considering the physical properties of heat transfer, this work proposes four different losses and joint learns the deep surrogate model with these losses. Experimental studies have conducted over typical two-dimensional heat-source systems to demonstrate the effectiveness and efficiency of the proposed physics-informed deep surrogate models for temperature field reconstruction.

【3】 PALMAR: Towards Adaptive Multi-inhabitant Activity Recognition in Point-Cloud Technology 标题:Palmar:面向自适应多居民活动识别的点云技术

作者:Mohammad Arif Ul Alam,Md Mahmudur Rahman,Jared Q Widberg 机构:Department of Computer Science, University of Massachusetts Lowell, MA, USA 备注:Accepted in IEEE International Conference on Computer Communications 2021 链接:https://arxiv.org/abs/2106.11902 摘要:随着深度神经网络和基于计算机视觉的人类活动识别技术的发展,点云数据技术(LiDAR、mmWave)因其具有隐私保护的特性而受到广泛关注。鉴于精确的PCD技术的高前景,我们开发了PALMAR,一个多居民活动识别系统,通过使用有效的信号处理和新的机器学习技术来跟踪个人,从而开发了一个自适应的多居民跟踪和HAR系统。更具体地说,我们提出(i)基于体素化特征表示的实时PCD微调方法,(ii)高效聚类(DBSCAN和BIRCH),基于自适应顺序隐马尔可夫模型的多人跟踪和交叉模糊度降低技术以及(iii)新的基于自适应深度学习的域自适应技术,以提高HAR在数据稀缺和多样性(设备、位置和种群多样性)情况下的准确性。我们使用(i)三台设备(3D激光雷达和79ghz毫米波)从6名参与者那里采集的实时PCD对我们的框架和系统进行了实验评估,(ii)一个公开的三维激光雷达活动数据(28名参与者)和(iii)一个嵌入式硬件原型系统,该系统在多居民(96%)场景中提供了良好的HAR性能,多人跟踪比最先进的框架提高了63%,而不会损失边缘计算设备的显著系统性能。 摘要:With the advancement of deep neural networks and computer vision-based Human Activity Recognition, employment of Point-Cloud Data technologies (LiDAR, mmWave) has seen a lot interests due to its privacy preserving nature. Given the high promise of accurate PCD technologies, we develop, PALMAR, a multiple-inhabitant activity recognition system by employing efficient signal processing and novel machine learning techniques to track individual person towards developing an adaptive multi-inhabitant tracking and HAR system. More specifically, we propose (i) a voxelized feature representation-based real-time PCD fine-tuning method, (ii) efficient clustering (DBSCAN and BIRCH), Adaptive Order Hidden Markov Model based multi-person tracking and crossover ambiguity reduction techniques and (iii) novel adaptive deep learning-based domain adaptation technique to improve the accuracy of HAR in presence of data scarcity and diversity (device, location and population diversity). We experimentally evaluate our framework and systems using (i) a real-time PCD collected by three devices (3D LiDAR and 79 GHz mmWave) from 6 participants, (ii) one publicly available 3D LiDAR activity data (28 participants) and (iii) an embedded hardware prototype system which provided promising HAR performances in multi-inhabitants (96%) scenario with a 63% improvement of multi-person tracking than state-of-art framework without losing significant system performances in the edge computing device.

【4】 Multiple Organ Failure Prediction with Classifier-Guided Generative Adversarial Imputation Networks 标题:基于分类器引导的生成性对抗性归罪网络的多器官功能衰竭预测

作者:Xinlu Zhang,Yun Zhao,Rachael Callcut,Linda Petzold 机构:Department of Computer Science, University of California, Santa Barbara, USA, UC, Davis Health, Davis, USA 备注:BioKDD 链接:https://arxiv.org/abs/2106.11878 摘要:多器官衰竭(MOF)是重症监护病房(ICU)患者中死亡率较高的一种严重综合征。早期准确的检测对于临床医生及时做出决定至关重要。将机器学习模型应用于电子健康记录(EHRs)的一个基本挑战是缺失值的普遍存在。现有的大多数插补方法都是在数据预处理阶段进行的,无法捕捉数据与结果之间的关系进行下游预测。在本文中,我们提出了分类器引导的生成性对抗性插补网络(分类器增益)来弥补这一差距,通过合并观测数据和标签信息。具体来说,分类器从生成器(输入器)中获取输入值来预测任务结果,并通过联合训练向生成器提供额外的监督信号。分类器引导生成器在训练过程中利用标签感知来填充缺失值,提高了分类器在推理过程中的性能。我们进行了大量的实验,结果表明,我们的方法在一系列缺失数据场景和评估指标上始终优于经典和最先进的神经基线。 摘要:Multiple organ failure (MOF) is a severe syndrome with a high mortality rate among Intensive Care Unit (ICU) patients. Early and precise detection is critical for clinicians to make timely decisions. An essential challenge in applying machine learning models to electronic health records (EHRs) is the pervasiveness of missing values. Most existing imputation methods are involved in the data preprocessing phase, failing to capture the relationship between data and outcome for downstream predictions. In this paper, we propose classifier-guided generative adversarial imputation networks Classifier-GAIN) for MOF prediction to bridge this gap, by incorporating both observed data and label information. Specifically, the classifier takes imputed values from the generator(imputer) to predict task outcomes and provides additional supervision signals to the generator by joint training. The classifier-guide generator imputes missing values with label-awareness during training, improving the classifier's performance during inference. We conduct extensive experiments showing that our approach consistently outperforms classical and state-of-art neural baselines across a range of missing data scenarios and evaluation metrics.

【5】 Towards Automated Evaluation of Explanations in Graph Neural Networks 标题:图神经网络中解释的自动评价研究

作者:Vanya BK,Balaji Ganesan,Aniket Saxena,Devbrat Sharma,Arvind Agarwal 备注:5 pages, 4 figures, XAI Workshop at ICML 2021 链接:https://arxiv.org/abs/2106.11864 摘要:用易于理解的术语向人工智能应用的最终用户解释图形神经网络预测仍然是一个未解决的问题。特别是,我们没有完善的方法来自动评估解释,更接近用户如何使用这些解释。基于最近的应用趋势和我们在实际问题中的经验,我们提出了GNN解释的自动评估方法。 摘要:Explaining Graph Neural Networks predictions to end users of AI applications in easily understandable terms remains an unsolved problem. In particular, we do not have well developed methods for automatically evaluating explanations, in ways that are closer to how users consume those explanations. Based on recent application trends and our own experiences in real world problems, we propose automatic evaluation approaches for GNN Explanations.

【6】 Speeding Up OPFython with Numba 标题:使用Numba加速OPFython

作者:Gustavo H. de Rosa,João Paulo Papa 机构:Department of Computing, São Paulo State University, Bauru, São Paulo - Brazil 备注:12 pages, 1 figure 链接:https://arxiv.org/abs/2106.11828 摘要:一个图启发的分类器,被称为最优路径林(OPF),已被证明是一个国家的最先进的算法,可与Logistic回归,支持向量机在各种各样的任务。最近,它基于Python的版本OPFython被提出,以提供更友好的框架和更快的原型环境。然而,基于Python的算法要比基于C的算法慢,在面对大量数据时会影响它们的性能。因此,本文提出了一种简单而高效的使用Numba包的加速算法,它加速了基于Numpy的计算,并试图提高算法的整体性能。实验结果表明,该方法比传统的基于Python的OPF方法取得了更好的效果,并加快了其测距计算速度。 摘要:A graph-inspired classifier, known as Optimum-Path Forest (OPF), has proven to be a state-of-the-art algorithm comparable to Logistic Regressors, Support Vector Machines in a wide variety of tasks. Recently, its Python-based version, denoted as OPFython, has been proposed to provide a more friendly framework and a faster prototyping environment. Nevertheless, Python-based algorithms are slower than their counterpart C-based algorithms, impacting their performance when confronted with large amounts of data. Therefore, this paper proposed a simple yet highly efficient speed up using the Numba package, which accelerates Numpy-based calculations and attempts to increase the algorithm's overall performance. Experimental results showed that the proposed approach achieved better results than the na\"ive Python-based OPF and speeded up its distance measurement calculation.

【7】 A Clustering-based Framework for Classifying Data Streams 标题:一种基于聚类的数据流分类框架

作者:Xuyang Yan,Abdollah Homaifar,Mrinmoy Sarkar,Abenezer Girma,Edward Tunstel 机构:North Carolina A&T State University, Greensboro, NC, USA, Raytheon Technologies Research Center, East Hartford, CT, USA 备注:This paper has been accepted by IJCAI 2021 链接:https://arxiv.org/abs/2106.11823 摘要:数据流的非平稳性对传统的机器学习技术提出了严峻的挑战。尽管已经提出了一些解决方案来扩展传统的机器学习技术来处理数据流,但是这些方法要么需要初始标签集,要么依赖于专门的设计参数。类之间的重叠和数据流的标记构成了对数据流进行分类的其他主要挑战。在本文中,我们提出了一个基于聚类的数据流分类框架来处理非平稳数据流,而不需要使用初始标签集。采用基于密度的流聚类方法,通过动态阈值捕获新概念,并引入有效的主动标签查询策略,从数据流中不断学习新概念。探讨每个类的子类结构,以处理类之间的重叠。实验结果和定量比较研究表明,该方法在统计上比现有方法有更好的或可比的性能。 摘要:The non-stationary nature of data streams strongly challenges traditional machine learning techniques. Although some solutions have been proposed to extend traditional machine learning techniques for handling data streams, these approaches either require an initial label set or rely on specialized design parameters. The overlap among classes and the labeling of data streams constitute other major challenges for classifying data streams. In this paper, we proposed a clustering-based data stream classification framework to handle non-stationary data streams without utilizing an initial label set. A density-based stream clustering procedure is used to capture novel concepts with a dynamic threshold and an effective active label querying strategy is introduced to continuously learn the new concepts from the data streams. The sub-cluster structure of each cluster is explored to handle the overlap among classes. Experimental results and quantitative comparison studies reveal that the proposed method provides statistically better or comparable performance than the existing methods.

【8】 Evo* 2021 -- Late-Breaking Abstracts Volume 标题:EVO*2021--最新摘要卷

作者:A. M. Mora,A. I. Esparcia-Alcázar 备注:LBAs accepted in Evo* 2021. Part of the Conference Proceedings 链接:https://arxiv.org/abs/2106.11804 摘要:这一卷的最新摘要提交给了Evo*2021会议,会议于2021年4月7日至9日在线举行。这些论文介绍了正在进行的研究和初步结果,调查了生物启发方法(主要是进化计算)的不同方法在不同问题上的应用,其中大多数是真实世界的问题。 摘要:Volumen with the Late-Breaking Abstracts submitted to the Evo* 2021 Conference, held online from 7 to 9 of April 2021. These papers present ongoing research and preliminary results investigating on the application of different approaches of Bioinspired Methods (mainly Evolutionary Computation) to different problems, most of them real world ones.

【9】 Exemplars-guided Empathetic Response Generation Controlled by the Elements of Human Communication 标题:人类交际要素控制下的样例引导的同理心反应生成

作者:Navonil Majumder,Deepanway Ghosal,Devamanyu Hazarika,Alexander Gelbukh,Rada Mihalcea,Soujanya Poria 机构:Singapore University of Technology, and Design, Singapore, National University of Singapore, Instituto Politécnico Nacional, Mexico City, Mexico, University of Michigan, Ann Arbor, Michigan, USA 链接:https://arxiv.org/abs/2106.11791 摘要:现有的移情反应生成方法大多依赖于情境中的情感来生成移情反应。然而,移情远不止是用适当的情绪产生反应。它也常常需要微妙的表达理解和个人共鸣与其他对话者的情况。不幸的是,这样的质量很难量化,而且数据集缺乏相关的注释。为了解决这个问题,在本文中,我们提出了一种方法,依靠范例提示生成模式的优良文体特性,信号移情的对话者。为此,我们采用密集通道检索的方法从训练集中提取相关的典型反应。人类交流的三个要素——情感的存在、解释和探索,以及情感,通过使用合成标签来引导一代人走向移情。人类的评价也被人类交流的这些要素所延伸。我们的经验表明,这些方法在移情反应的质量方面产生了显著的改善,无论是自动的还是人工评估的指标。可在https://github.com/declare-lab/exemplary-empathy. 摘要:The majority of existing methods for empathetic response generation rely on the emotion of the context to generate empathetic responses. However, empathy is much more than generating responses with an appropriate emotion. It also often entails subtle expressions of understanding and personal resonance with the situation of the other interlocutor. Unfortunately, such qualities are difficult to quantify and the datasets lack the relevant annotations. To address this issue, in this paper we propose an approach that relies on exemplars to cue the generative model on fine stylistic properties that signal empathy to the interlocutor. To this end, we employ dense passage retrieval to extract relevant exemplary responses from the training set. Three elements of human communication -- emotional presence, interpretation, and exploration, and sentiment are additionally introduced using synthetic labels to guide the generation towards empathy. The human evaluation is also extended by these elements of human communication. We empirically show that these approaches yield significant improvements in empathetic response quality in terms of both automated and human-evaluated metrics. The implementation is available at https://github.com/declare-lab/exemplary-empathy.

【10】 A Stealthy and Robust Fingerprinting Scheme for Generative Models 标题:一种适用于产生式模型的隐形鲁棒指纹方案

作者:Li Guanlin,Guo Shangwei,Wang Run,Xu Guowen,Zhang Tianwei 机构:Nanyang Technological University, Chongqing University, Wuhan University 链接:https://arxiv.org/abs/2106.11760 摘要:本文提出了一种新的用于生成模型知识产权保护的指纹识别方法。判别模型的先验解通常采用对抗性的例子作为指纹,给出异常的推理行为和预测结果。因此,这些方法不是秘密的,很容易被对手识别。我们的方法利用隐形后门技术来克服上述限制。具体来说,我们设计验证样本,其模型输出看起来正常,但可以触发后门分类器作出异常预测。我们提出了一种新的后门嵌入方法,具有独特的三重丢失和细粒度分类,以提高指纹的有效性。大量的评估结果表明,该方法对于不同的GAN模型具有更高的鲁棒性、唯一性和隐蔽性。 摘要:This paper presents a novel fingerprinting methodology for the Intellectual Property protection of generative models. Prior solutions for discriminative models usually adopt adversarial examples as the fingerprints, which give anomalous inference behaviors and prediction results. Hence, these methods are not stealthy and can be easily recognized by the adversary. Our approach leverages the invisible backdoor technique to overcome the above limitation. Specifically, we design verification samples, whose model outputs look normal but can trigger a backdoor classifier to make abnormal predictions. We propose a new backdoor embedding approach with Unique-Triplet Loss and fine-grained categorization to enhance the effectiveness of our fingerprints. Extensive evaluations show that this solution can outperform other strategies with higher robustness, uniqueness and stealthiness for various GAN models.

【11】 Trinity: A No-Code AI platform for complex spatial datasets 标题:Trinity:一个面向复杂空间数据集的无码人工智能平台

作者:C. V. Krishnakumar Iyer,Feili Hou,Henry Wang,Yonghong Wang,Kay Oh,Swetava Ganguli,Vipul Pandey 备注:12 pages, Submitted to SIGSPATIAL '21 链接:https://arxiv.org/abs/2106.11756 摘要:我们提出了一个名为Trinity的无代码人工智能(AI)平台,其主要设计目标是使机器学习研究人员和非技术地理空间领域专家能够对特定领域的信号和数据集进行实验,以自行解决各种复杂问题。这种解决不同问题的多功能性是通过转换复杂的时空数据集来实现的,使它们能够被标准的深度学习模型(在本例中是卷积神经网络(CNN))所使用,并提供以标准方式(例如语义分割)来描述不同问题的能力。凭借直观的用户界面、承载复杂功能工程衍生产品的功能商店、深度学习内核和可扩展的数据处理机制,Trinity为领域专家提供了一个强大的平台,让他们与科学家和工程师共享解决关键业务问题的舞台。它通过标准化模型构建和部署,实现了快速原型设计、快速实验和缩短生产时间。在本文中,我们介绍了Trinity及其设计背后的动机,并展示了示例应用程序,以鼓励降低使用AI的门槛。 摘要:We present a no-code Artificial Intelligence (AI) platform called Trinity with the main design goal of enabling both machine learning researchers and non-technical geospatial domain experts to experiment with domain-specific signals and datasets for solving a variety of complex problems on their own. This versatility to solve diverse problems is achieved by transforming complex Spatio-temporal datasets to make them consumable by standard deep learning models, in this case, Convolutional Neural Networks (CNNs), and giving the ability to formulate disparate problems in a standard way, eg. semantic segmentation. With an intuitive user interface, a feature store that hosts derivatives of complex feature engineering, a deep learning kernel, and a scalable data processing mechanism, Trinity provides a powerful platform for domain experts to share the stage with scientists and engineers in solving business-critical problems. It enables quick prototyping, rapid experimentation and reduces the time to production by standardizing model building and deployment. In this paper, we present our motivation behind Trinity and its design along with showcasing sample applications to motivate the idea of lowering the bar to using AI.

【12】 LV-BERT: Exploiting Layer Variety for BERT 标题:LV-BERT:开发BERT的层多样性

作者:Weihao Yu,Zihang Jiang,Fei Chen,Qibin Hou,Jiashi Feng 机构:National University, of Singapore, Huawei Noah’s, Ark Lab 备注:Accepted to Findings of ACL 2021. The code and pre-trained models are available at this https URL 链接:https://arxiv.org/abs/2106.11740 摘要:现代预先训练的语言模型大多是建立在主干上,以交错的顺序堆叠自我注意和前馈层。在本文中,除了这种刻板的层模式之外,我们还从层类型集和层顺序两个方面来利用层的多样性来改进预先训练的模型。具体来说,除了原始的自我注意层和前馈层外,我们在层类型集中引入了卷积,实验发现这对预先训练的模型是有益的。此外,除了原来的交错顺序,我们探索更多的层顺序,以发现更强大的架构。然而,引入的层多样性导致了超过数十亿个候选模型的大的架构空间,而从头开始训练单个候选模型已经需要巨大的计算成本,使得通过直接训练大量候选模型来搜索这样的空间是负担不起的。为了解决这一问题,我们首先对一个超网进行预训练,从中继承所有候选模型的权值,然后采用基于预训练精度的进化算法来寻找最优结构。大量实验表明,该方法得到的LV-BERT模型在各种下游任务上都优于BERT及其变种。例如,LV BERT small在胶水测试集上达到78.8,比强基线ELECTRA small高出1.8。 摘要:Modern pre-trained language models are mostly built upon backbones stacking self-attention and feed-forward layers in an interleaved order. In this paper, beyond this stereotyped layer pattern, we aim to improve pre-trained models by exploiting layer variety from two aspects: the layer type set and the layer order. Specifically, besides the original self-attention and feed-forward layers, we introduce convolution into the layer type set, which is experimentally found beneficial to pre-trained models. Furthermore, beyond the original interleaved order, we explore more layer orders to discover more powerful architectures. However, the introduced layer variety leads to a large architecture space of more than billions of candidates, while training a single candidate model from scratch already requires huge computation cost, making it not affordable to search such a space by directly training large amounts of candidate models. To solve this problem, we first pre-train a supernet from which the weights of all candidate models can be inherited, and then adopt an evolutionary algorithm guided by pre-training accuracy to find the optimal architecture. Extensive experiments show that LV-BERT model obtained by our method outperforms BERT and its variants on various downstream tasks. For example, LV-BERT-small achieves 78.8 on the GLUE testing set, 1.8 higher than the strong baseline ELECTRA-small.

【13】 Lifted Model Checking for Relational MDPs 标题:关系MDP的提升模型检测

作者:Wen-Chi Yang,Jean-François Raskin,Luc De Raedt 机构:Jean-Fran¸cois Raskin, Received: date Accepted: date 链接:https://arxiv.org/abs/2106.11735 摘要:模型检验是为了验证具有随机和不确定性行为的系统的行为而发展起来的。它被用来为这样的系统提供保证。虽然大多数模型检查方法侧重于命题模型,但各种概率规划和强化框架处理的是关系域,例如条带规划和关系马尔可夫决策过程。在关系设置中使用命题模型检查需要对模型进行接地,这导致了众所周知的状态爆炸问题和困难性。我们提出了pCTL-REBEL,一种在关系mdp上验证pCTL属性的提升模型检查方法。它将relationalbellman更新算子REBEL(一种用于基于模型的关系强化学习的提升值迭代方法)扩展到关系模型检查。PCTL-REBEL被解除,这意味着该模型在抽象的关系层次上利用了对称性和原因,而不是根植。理论上,我们证明了pCTL模型检验方法对于关系mdp是可判定的,即使对于可能无限的域,只要状态有一个有界的大小。实际上,我们给出了提升关系模型检查的算法和实现,并且我们证明了提升方法提高了模型检查方法的可扩展性。 摘要:Model checking has been developed for verifying the behaviour of systems with stochastic and non-deterministic behavior. It is used to provide guarantees about such systems. While most model checking methods focus on propositional models, various probabilistic planning and reinforcement frameworks deal with relational domains, for instance, STRIPS planning and relational Markov Decision Processes. Using propositional model checking in relational settings requires one to ground the model, which leads to the well known state explosion problem and intractability. We present pCTL-REBEL, a lifted model checking approach for verifying pCTL properties on relational MDPs. It extends REBEL, the relational Bellman update operator, which is a lifted value iteration approach for model-based relational reinforcement learning, toward relational model-checking. PCTL-REBEL is lifted, which means that rather than grounding, the model exploits symmetries and reasons at an abstract relational level. Theoretically, we show that the pCTL model checking approach is decidable for relational MDPs even for possibly infinite domains provided that the states have a bounded size. Practically, we contribute algorithms and an implementation of lifted relational model checking, and we show that the lifted approach improves the scalability of the model checking approach.

【14】 On Constrained Optimization in Differentiable Neural Architecture Search 标题:可微神经结构搜索中的约束优化问题研究

作者:Kaitlin Maile,Erwan Lecarpentier,Hervé Luga,Dennis G. Wilson 机构: IRIT, University of Toulouse, Toulouse, France, IRT Saint-Exupery, Toulouse, France, ISAE-SUPAERO, University of Toulouse, Toulouse, France 链接:https://arxiv.org/abs/2106.11655 摘要:可微结构搜索(DARTS)是最近提出的一种基于可微松弛的神经结构搜索(NAS)方法。由于它的成功,最近提出了许多分析和改进DARTS框架的变体。通过将问题看作一个约束双层优化问题,我们提出并分析了三种改进方案,即结构权重竞争、更新调度和向离散化方向的正则化。首先,我们引入一种新的方法来激活体系结构权重,它可以防止边缘内的混淆竞争,并允许跨边缘的公平比较,以帮助离散化。接下来,我们提出了一个基于每小批量网络信息的动态调度方案,以使体系结构更新更加及时。最后,我们考虑了两种正则化方法,基于离散化的邻近性和交替方向乘子法(ADMM)算法,以促进早期离散化。我们的结果表明,这种新的激活方案减少了最终架构的大小,并且正则化提高了搜索结果的可靠性,同时保持了与NAS中最新技术相当的性能,特别是当与我们新的动态通知调度一起使用时。 摘要:Differentiable Architecture Search (DARTS) is a recently proposed neural architecture search (NAS) method based on a differentiable relaxation. Due to its success, numerous variants analyzing and improving parts of the DARTS framework have recently been proposed. By considering the problem as a constrained bilevel optimization, we propose and analyze three improvements to architectural weight competition, update scheduling, and regularization towards discretization. First, we introduce a new approach to the activation of architecture weights, which prevents confounding competition within an edge and allows for fair comparison across edges to aid in discretization. Next, we propose a dynamic schedule based on per-minibatch network information to make architecture updates more informed. Finally, we consider two regularizations, based on proximity to discretization and the Alternating Directions Method of Multipliers (ADMM) algorithm, to promote early discretization. Our results show that this new activation scheme reduces final architecture size and the regularizations improve reliability in search results while maintaining comparable performance to state-of-the-art in NAS, especially when used with our new dynamic informed schedule.

【15】 MMD-MIX: Value Function Factorisation with Maximum Mean Discrepancy for Cooperative Multi-Agent Reinforcement Learning 标题:MMD-MIX:协作式多智能体强化学习的均值最大值函数分解

作者:Zhiwei Xu,Dapeng Li,Yunpeng Bai,Guoliang Fan 机构:Fusion Innovation Center, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China 备注:7 pages, 2 figures, 2 tables. Accepted by IJCNN 2021 链接:https://arxiv.org/abs/2106.11652 摘要:在现实世界中,许多任务需要多个agent在局部观测条件下相互协作。为了解决这些问题,人们提出了许多基于集中训练和分散执行的多智能体强化学习方法。一个代表性的工作类别是价值分解,它将全局联合Q值$Q\text{jt}$分解为个体Q值$Q\u a$,以指导个体的行为,例如VDN(价值分解网络)和QMIX。然而,这些基线往往忽略了情况的随机性。我们提出MMD-MIX,一种结合分布强化学习和价值分解的方法来缓解上述缺点。此外,为了提高数据的采样效率,我们借鉴了REM(Random-emessemblimix)算法,该算法是一种鲁棒的RL算法,它将随机性引入到MMD-MIX中。实验表明,在星际争霸多智能体挑战(SMAC)环境中,MMD-MIX的性能优于先前的基线。 摘要:In the real world, many tasks require multiple agents to cooperate with each other under the condition of local observations. To solve such problems, many multi-agent reinforcement learning methods based on Centralized Training with Decentralized Execution have been proposed. One representative class of work is value decomposition, which decomposes the global joint Q-value $Q_\text{jt}$ into individual Q-values $Q_a$ to guide individuals' behaviors, e.g. VDN (Value-Decomposition Networks) and QMIX. However, these baselines often ignore the randomness in the situation. We propose MMD-MIX, a method that combines distributional reinforcement learning and value decomposition to alleviate the above weaknesses. Besides, to improve data sampling efficiency, we were inspired by REM (Random Ensemble Mixture) which is a robust RL algorithm to explicitly introduce randomness into the MMD-MIX. The experiments demonstrate that MMD-MIX outperforms prior baselines in the StarCraft Multi-Agent Challenge (SMAC) environment.

【16】 Zero-Shot Chinese Character Recognition with Stroke-Level Decomposition 标题:基于笔画级分解的零射汉字识别

作者:Jingye Chen,Bin Li,Xiangyang Xue 机构:Shanghai Key Laboratory of Intelligent Information Processing, School of Computer Science, Fudan University 备注:7 pages, 7 figures 链接:https://arxiv.org/abs/2106.11613 摘要:汉字识别由于其广泛的应用而引起了人们的广泛研究兴趣。虽然已经研究了多年,但该领域的一些问题尚未完全解决,如零炮问题。以往的基于字符和根的方法并没有从根本上解决零炮问题,因为在数据饥饿的情况下,测试集中的一些字符或根可能不会出现在训练集中。受这样一个事实的启发:如果人类已经学会了某些字符的笔画顺序,就可以概括出如何书写以前看不见的字符,我们提出了一种基于笔画的方法,将每个字符分解成一系列笔画,这些笔画是汉字最基本的单位。然而,我们观察到笔划序列和汉字之间存在一对多的关系。为了解决这个问题,我们采用了一种基于匹配的策略来将预测的笔划序列转换成特定的字符。我们对所提出的方法在手写字符、印刷艺术字符和场景字符上进行了评估。实验结果表明,该方法在字符零拍和字根零拍任务上均优于现有方法。此外,该方法还可以很容易地推广到其他字符可以分解成笔画的语言中。 摘要:Chinese character recognition has attracted much research interest due to its wide applications. Although it has been studied for many years, some issues in this field have not been completely resolved yet, e.g. the zero-shot problem. Previous character-based and radical-based methods have not fundamentally addressed the zero-shot problem since some characters or radicals in test sets may not appear in training sets under a data-hungry condition. Inspired by the fact that humans can generalize to know how to write characters unseen before if they have learned stroke orders of some characters, we propose a stroke-based method by decomposing each character into a sequence of strokes, which are the most basic units of Chinese characters. However, we observe that there is a one-to-many relationship between stroke sequences and Chinese characters. To tackle this challenge, we employ a matching-based strategy to transform the predicted stroke sequence to a specific character. We evaluate the proposed method on handwritten characters, printed artistic characters, and scene characters. The experimental results validate that the proposed method outperforms existing methods on both character zero-shot and radical zero-shot tasks. Moreover, the proposed method can be easily generalized to other languages whose characters can be decomposed into strokes.

【17】 Multi-layered Semantic Representation Network for Multi-label Image Classification 标题:用于多标签图像分类的多层语义表示网络

作者:Xiwen Qu,Hao Che,Jun Huang,Linchuan Xu,Xiao Zheng 机构:School of Computer Science and Technology, Anhui University of Technology, China, Institute of Artificial Intelligence, Hefei Comprehensive National Science Center, China, Australian National University, Australia 链接:https://arxiv.org/abs/2106.11596 摘要:多标签图像分类(Multi-label image classification,MLIC)是一项基础性和实用性的工作,其目的是为一幅图像分配多个可能的标签。近年来,许多基于深度卷积神经网络(CNN)的方法被提出,通过建立标签相关性模型来发现标签的语义和学习图像的语义表示。本文通过对标签关联建模和语义表示学习的改进,提出了这一研究方向。一方面,除了每个标签的局部语义外,我们建议进一步研究多个标签共享的全局语义。另一方面,现有的方法主要学习CNN的最后一个卷积层的语义表示。但是人们注意到,CNN的不同层的图像表示捕获了不同层次或尺度的特征,并且具有不同的辨别能力。因此,我们建议学习在多个卷积层的语义表示。为此,本文设计了一个多层语义表示网络(MSRN),通过对标签相关性的建模,发现标签的局部语义和全局语义,并利用标签语义通过注意机制指导多层语义表示学习。在voc2007、COCO、NUS-WIDE和Apparel等四个基准数据集上进行的大量实验表明,与最先进的模型相比,所提出的MSRN具有竞争力。 摘要:Multi-label image classification (MLIC) is a fundamental and practical task, which aims to assign multiple possible labels to an image. In recent years, many deep convolutional neural network (CNN) based approaches have been proposed which model label correlations to discover semantics of labels and learn semantic representations of images. This paper advances this research direction by improving both the modeling of label correlations and the learning of semantic representations. On the one hand, besides the local semantics of each label, we propose to further explore global semantics shared by multiple labels. On the other hand, existing approaches mainly learn the semantic representations at the last convolutional layer of a CNN. But it has been noted that the image representations of different layers of CNN capture different levels or scales of features and have different discriminative abilities. We thus propose to learn semantic representations at multiple convolutional layers. To this end, this paper designs a Multi-layered Semantic Representation Network (MSRN) which discovers both local and global semantics of labels through modeling label correlations and utilizes the label semantics to guide the semantic representations learning at multiple layers through an attention mechanism. Extensive experiments on four benchmark datasets including VOC 2007, COCO, NUS-WIDE, and Apparel show a competitive performance of the proposed MSRN against state-of-the-art models.

【18】 Reinforcement learning for PHY layer communications 标题:用于物理层通信的强化学习

作者:Philippe Mary,Visa Koivunen,Christophe Moy 机构:. 备注:Machine Learning and Wireless Communications, In press 链接:https://arxiv.org/abs/2106.11595 摘要:在本章中,我们将通过定义不同类别的问题以及处理这些问题的可能解决方案,给出应用RL优化无线通信物理层的综合实例。在第9.2节中,我们介绍了解决RL问题所需的所有基本理论,即马尔可夫决策过程(MDP)、部分可观测马尔可夫决策过程(POMDP),以及两个非常重要且应用广泛的RL算法,即Q-学习和SARSA算法。我们还介绍了深度强化学习(DRL)范式,最后一节介绍了多武装土匪(MAB)框架。第9.3节着重于一些玩具的例子来说明如何在通信系统中使用RL的基本概念。我们使用与本章第9.2节中类似的符号从简化系统模型的文献中提取应用程序。在第9.3节中,我们还着重于建模RL问题,即如何选择行动和状态空间以及奖励。本章在第9.4节中结束,对RL趋势进行了前瞻性思考,并在第9.5节中对更广泛的技术现状进行了回顾。 摘要:In this chapter, we will give comprehensive examples of applying RL in optimizing the physical layer of wireless communications by defining different class of problems and the possible solutions to handle them. In Section 9.2, we present all the basic theory needed to address a RL problem, i.e. Markov decision process (MDP), Partially observable Markov decision process (POMDP), but also two very important and widely used algorithms for RL, i.e. the Q-learning and SARSA algorithms. We also introduce the deep reinforcement learning (DRL) paradigm and the section ends with an introduction to the multi-armed bandits (MAB) framework. Section 9.3 focuses on some toy examples to illustrate how the basic concepts of RL are employed in communication systems. We present applications extracted from literature with simplified system models using similar notation as in Section 9.2 of this Chapter. In Section 9.3, we also focus on modeling RL problems, i.e. how action and state spaces and rewards are chosen. The Chapter is concluded in Section 9.4 with a prospective thought on RL trends and it ends with a review of a broader state of the art in Section 9.5.

【19】 A Vertical Federated Learning Framework for Graph Convolutional Network 标题:一种用于图卷积网络的垂直联合学习框架

作者:Xiang Ni,Xiaolong Xu,Lingjuan Lyu,Changhua Meng,Weiqiang Wang 机构:Ant Group 链接:https://arxiv.org/abs/2106.11593 摘要:近年来,图形神经网络(GNN)在解决图形数据的各种实际问题方面取得了显著的成功。然而在大多数行业中,数据是以孤岛的形式存在的,数据的隐私性和安全性也是一个重要的问题。在本文中,我们提出了联邦GCN学习范式FedVGCN,该范式可推广到现有的GCN模型中,用于数据垂直分割环境下的隐私保护节点分类任务。具体来说,我们将计算图数据分为两部分。对于训练过程的每次迭代,双方在同态加密下相互传递中间结果。我们在基准数据上进行了实验,结果证明了FedVGCN在图形图像情况下的有效性。 摘要:Recently, Graph Neural Network (GNN) has achieved remarkable success in various real-world problems on graph data. However in most industries, data exists in the form of isolated islands and the data privacy and security is also an important issue. In this paper, we propose FedVGCN, a federated GCN learning paradigm for privacy-preserving node classification task under data vertically partitioned setting, which can be generalized to existing GCN models. Specifically, we split the computation graph data into two parts. For each iteration of the training process, the two parties transfer intermediate results to each other under homomorphic encryption. We conduct experiments on benchmark data and the results demonstrate the effectiveness of FedVGCN in the case of GraphSage.

【20】 Continuous-Depth Neural Models for Dynamic Graph Prediction 标题:动态图预测的连续深度神经模型

作者:Michael Poli,Stefano Massaroli,Clayton M. Rabideau,Junyoung Park,Atsushi Yamashita,Hajime Asama,Jinkyoo Park 机构: 2The University of Tokyo 3Syntensor 备注:Extended version of the workshop paper "Graph Neural Ordinary Differential Equations". arXiv admin note: substantial text overlap with arXiv:1911.07532 链接:https://arxiv.org/abs/2106.11581 摘要:介绍了连续深度图神经网络的结构。神经图微分方程(Neural-graph-differential equations,Neural-GDEs)被形式化为GNNs的对应形式,其中输入输出关系由GNN层的连续统一体决定,混合了离散拓扑结构和微分方程。该框架与静态GNN模型兼容,并通过混合动力系统理论扩展到动态和随机环境。在这里,神经GDE通过利用基本的动力学几何结构来提高性能,进一步引入了适应不规则采样数据的能力。结果证明了所提出的模型在交通量预测、遗传调控网络预测等方面的有效性。 摘要:We introduce the framework of continuous-depth graph neural networks (GNNs). Neural graph differential equations (Neural GDEs) are formalized as the counterpart to GNNs where the input-output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. The proposed framework is shown to be compatible with static GNN models and is extended to dynamic and stochastic settings through hybrid dynamical system theory. Here, Neural GDEs improve performance by exploiting the underlying dynamics geometry, further introducing the ability to accommodate irregularly sampled data. Results prove the effectiveness of the proposed models across applications, such as traffic forecasting or prediction in genetic regulatory networks.

【21】 Universal Domain Adaptation in Ordinal Regression 标题:有序回归中的泛域自适应

作者:Chidlovskii Boris,Assem Sadek,Christian Wolf 机构:Naver Labs Europe, ch. Maupertuis , Meylan, France, INSA-Lyon, LIRIS, UMR CNRS , Villeurbanne 链接:https://arxiv.org/abs/2106.11576 摘要:我们讨论了序数回归(OR)中的泛域适应(UDA)问题,它试图解决标签不是独立的,而是遵循自然顺序的分类问题。我们证明了为分类而开发的UDA技术和基于聚类假设的UDA技术在或设置中的性能不足。我们提出了一种用顺序学习辅助任务来补充OR分类器的方法,它起到区分公共实例和私有实例的双重作用,并通过排序将类标签扩展到私有目标图像。结合对抗域识别,我们的模型能够处理闭集、部分和开集配置。我们在三个人脸年龄估计数据集上对我们的方法进行了评估,结果表明它优于基线方法。 摘要:We address the problem of universal domain adaptation (UDA) in ordinal regression (OR), which attempts to solve classification problems in which labels are not independent, but follow a natural order. We show that the UDA techniques developed for classification and based on the clustering assumption, under-perform in OR settings. We propose a method that complements the OR classifier with an auxiliary task of order learning, which plays the double role of discriminating between common and private instances, and expanding class labels to the private target images via ranking. Combined with adversarial domain discrimination, our model is able to address the closed set, partial and open set configurations. We evaluate our method on three face age estimation datasets, and show that it outperforms the baseline methods.

【22】 Learn to Resolve Conversational Dependency: A Consistency Training Framework for Conversational Question Answering 标题:学会解决会话依赖:会话问答的一致性训练框架

作者:Gangwoo Kim,Hyunjae Kim,Jungsoo Park,Jaewoo Kang 机构:Korea University 备注:12 pages, ACL 2021 链接:https://arxiv.org/abs/2106.11575 摘要:会话问答(CQA)的主要挑战之一是解决会话依赖,如回指和省略。然而,现有的方法并没有明确地训练QA模型如何解决依赖关系,因此这些模型在理解人类对话方面受到限制。在本文中,我们提出了一个新的框架ExCorD(Explicit guidance on how to resolve Conversational Dependency)来提高QA模型理解会话上下文的能力。ExCorD首先生成不需要会话历史就可以理解的自包含问题,然后使用基于一致性的正则化器训练包含原始问题和自包含问题对的QA模型。在我们的实验中,我们证明ExCorD显著地提高了QA模型的性能,在QuAC上提高了1.2f1,在CANARD上提高了5.2f1,同时解决了现有方法的局限性。 摘要:One of the main challenges in conversational question answering (CQA) is to resolve the conversational dependency, such as anaphora and ellipsis. However, existing approaches do not explicitly train QA models on how to resolve the dependency, and thus these models are limited in understanding human dialogues. In this paper, we propose a novel framework, ExCorD (Explicit guidance on how to resolve Conversational Dependency) to enhance the abilities of QA models in comprehending conversational context. ExCorD first generates self-contained questions that can be understood without the conversation history, then trains a QA model with the pairs of original and self-contained questions using a consistency-based regularizer. In our experiments, we demonstrate that ExCorD significantly improves the QA models' performance by up to 1.2 F1 on QuAC, and 5.2 F1 on CANARD, while addressing the limitations of the existing approaches.

【23】 Do Language Models Perform Generalizable Commonsense Inference? 标题:语言模型是否执行可概括的常识推理?

作者:Peifeng Wang,Filip Ilievski,Muhao Chen,Xiang Ren 机构:Department of Computer Science, University of Southern California, Information Sciences Institute, University of Southern California 备注:8 pages, 4 figures. Accepted to ACL'21 Findings 链接:https://arxiv.org/abs/2106.11533 摘要:受预先训练的语言模型(LMs)编码常识知识的证据的启发,最近的工作已经应用LMs自动填充常识知识图(CKGs)。然而,人们对其推广到多个CKG、看不见的关系和新颖的实体缺乏认识。本文从知识容量、可转移性和归纳三个方面分析了LMs进行归纳推理的能力。我们在这三个方面的实验表明:(1)LMs能够适应由多个ckg定义的不同模式,但不能重用知识来推广新的关系(2) 自适应LMs能很好地推广到看不见的对象,但对新奇对象的推广效果较差。如何提高LMs中常识挖掘的可传递性和可归纳性,是下一步研究的重点。 摘要:Inspired by evidence that pretrained language models (LMs) encode commonsense knowledge, recent work has applied LMs to automatically populate commonsense knowledge graphs (CKGs). However, there is a lack of understanding on their generalization to multiple CKGs, unseen relations, and novel entities. This paper analyzes the ability of LMs to perform generalizable commonsense inference, in terms of knowledge capacity, transferability, and induction. Our experiments with these three aspects show that: (1) LMs can adapt to different schemas defined by multiple CKGs but fail to reuse the knowledge to generalize to new relations. (2) Adapted LMs generalize well to unseen subjects, but less so on novel objects. Future work should investigate how to improve the transferability and induction of commonsense mining from LMs.

【24】 Graph Routing between Capsules 标题:胶囊之间的图路由

作者:Yang Li,Wei Zhao,Erik Cambria,Suhang Wang,Steffen Eger 机构:Northwestern Polytechnical University, China, Nanyang Technological University, Singapore, Technical University of Darmstadt, Germany, Pennsylvania State University, USA 备注:None 链接:https://arxiv.org/abs/2106.11531 摘要:胶囊网络中的路由方法通常学习连续层胶囊之间的层次关系,但对同一层胶囊之间的内部关系研究较少,而这种内部关系是文本数据语义理解的关键因素。因此,本文引入一种新的带图路由的胶囊网络来学习这两种关系,其中每一层的胶囊都被视为图的节点。我们研究了从一层胶囊中产生三种不同距离的邻接矩阵和度矩阵的策略,并提出了胶囊之间的图路由机制。我们在五个文本分类数据集上验证了我们的方法,并且我们的发现表明,结合自底向上路由和自顶向下注意的方法表现最好。这种方法展示了跨数据集的泛化能力。与最新的路由方法相比,我们使用的五个数据集的精确度分别提高了0.82、0.39、0.07、1.01和0.02。 摘要:Routing methods in capsule networks often learn a hierarchical relationship for capsules in successive layers, but the intra-relation between capsules in the same layer is less studied, while this intra-relation is a key factor for the semantic understanding in text data. Therefore, in this paper, we introduce a new capsule network with graph routing to learn both relationships, where capsules in each layer are treated as the nodes of a graph. We investigate strategies to yield adjacency and degree matrix with three different distances from a layer of capsules, and propose the graph routing mechanism between those capsules. We validate our approach on five text classification datasets, and our findings suggest that the approach combining bottom-up routing and top-down attention performs the best. Such an approach demonstrates generalization capability across datasets. Compared to the state-of-the-art routing methods, the improvements in accuracy in the five datasets we used were 0.82, 0.39, 0.07, 1.01, and 0.02, respectively.

【25】 Recent Deep Semi-supervised Learning Approaches and Related Works 标题:深度半监督学习方法及相关研究进展

作者:Gyeongho Kim 机构:Department of Industrial Engineering 链接:https://arxiv.org/abs/2106.11528 摘要:作者的这项工作提出了一个最近的半监督学习方法和相关工作的概述。尽管神经网络在各种应用中取得了显著的成功,但仍然存在一些难以克服的限制,包括需要大量的标记数据。因此,半监督学习(semi-supervised learning)越来越重要,它是一种利用稀缺的标签和大量的未标记数据来训练模型(如深度神经网络)的学习方案。基于半监督学习的主要假设,即流形假设、聚类假设和连续性假设,本文回顾了近年来半监督学习方法的研究进展。特别地,对在半监督学习环境中使用深度神经网络的方法进行了初步讨论。此外,本文首先对现有的研究成果进行了分类和阐释,然后详细阐述了统一上述思想的整体方法。 摘要:The author of this work proposes an overview of the recent semi-supervised learning approaches and related works. Despite the remarkable success of neural networks in various applications, there exist few formidable constraints including the need for a large amount of labeled data. Therefore, semi-supervised learning, which is a learning scheme in which the scarce labels and a larger amount of unlabeled data are utilized to train models (e.g., deep neural networks) is getting more important. Based on the key assumptions of semi-supervised learning, which are the manifold assumption, cluster assumption, and continuity assumption, the work reviews the recent semi-supervised learning approaches. In particular, the methods in regard to using deep neural networks in a semi-supervised learning setting are primarily discussed. In addition, the existing works are first classified based on the underlying idea and explained, and then the holistic approaches that unify the aforementioned ideas are detailed.

【26】 Agnostic Reinforcement Learning with Low-Rank MDPs and Rich Observations 标题:具有低秩MDP和丰富观测的不可知强化学习

作者:Christoph Dann,Yishay Mansour,Mehryar Mohri,Ayush Sekhari,Karthik Sridharan 机构:Google Research, Cornell University, Tel Aviv University, Courant Institute of Mathematical Sciences 链接:https://arxiv.org/abs/2106.11519 摘要:在观察空间丰富的问题中,关于可证明有效强化学习(RL)的研究已经取得了许多新进展。然而,所有这些工作都有一个关于真实MDP的最优值函数的强可实现性假设。这种可实现性假设在实践中往往过于强大。在这项工作中,我们考虑更现实的不可知RL设置,它具有丰富的观察空间和一个固定的策略类$\Pi$,其中可能不包含任何接近最优的策略。我们为这种设置提供了一种算法,其误差以底层MDP的秩$d$为界。具体地说,我们的算法具有$\widetilde{O}\ left((H^{4d}K^{3d}\log |\Pi |)/\epsilon^2\ right)的样本复杂度界限,其中,$H$是片段长度,$K$是动作数,$\epsilon>0$是所需的次优。我们还提供了一个几乎匹配的下界,这个下界表明指数依赖于秩是不可避免的,没有进一步的假设。 摘要:There have been many recent advances on provably efficient Reinforcement Learning (RL) in problems with rich observation spaces. However, all these works share a strong realizability assumption about the optimal value function of the true MDP. Such realizability assumptions are often too strong to hold in practice. In this work, we consider the more realistic setting of agnostic RL with rich observation spaces and a fixed class of policies $\Pi$ that may not contain any near-optimal policy. We provide an algorithm for this setting whose error is bounded in terms of the rank $d$ of the underlying MDP. Specifically, our algorithm enjoys a sample complexity bound of $\widetilde{O}\left((H^{4d} K^{3d} \log |\Pi|)/\epsilon^2\right)$ where $H$ is the length of episodes, $K$ is the number of actions and $\epsilon>0$ is the desired sub-optimality. We also provide a nearly matching lower bound for this agnostic setting that shows that the exponential dependence on rank is unavoidable, without further assumptions.

【27】 SA-LOAM: Semantic-aided LiDAR SLAM with Loop Closure 标题:SA-LAAM:带循环闭包的语义辅助LiDAR SLAM

作者:Lin Li,Xin Kong,Xiangrui Zhao,Wanlong Li,Feng Wen,Hongbo Zhang,Yong Liu 机构: Zhejiang University 链接:https://arxiv.org/abs/2106.11516 摘要:基于激光雷达的SLAM系统是公认的比其他系统更精确和稳定,但其环路闭合检测仍然是一个悬而未决的问题。随着点云三维语义分割技术的发展,可以方便、稳定地获取点云的语义信息,是实现高层次智能化的关键,有利于SLAM。在本文中,我们提出了一种新的基于LOAM的语义辅助lidarslam,称为SA-LOAM,它利用了里程计和环路闭合检测中的语义。具体来说,我们提出了一个语义辅助的ICP,包括语义匹配、下采样和平面约束,并在循环闭合检测模块中集成了基于语义图的位置识别方法。借助于语义,我们可以提高定位精度,有效地检测循环闭包,甚至在大规模场景中也可以构造一个全局一致的语义映射。在KITTI和Ford校园数据集上的大量实验表明,该系统显著提高了基线性能,对未知数据具有泛化能力,取得了与现有方法相比较有竞争力的结果。 摘要:LiDAR-based SLAM system is admittedly more accurate and stable than others, while its loop closure detection is still an open issue. With the development of 3D semantic segmentation for point cloud, semantic information can be obtained conveniently and steadily, essential for high-level intelligence and conductive to SLAM. In this paper, we present a novel semantic-aided LiDAR SLAM with loop closure based on LOAM, named SA-LOAM, which leverages semantics in odometry as well as loop closure detection. Specifically, we propose a semantic-assisted ICP, including semantically matching, downsampling and plane constraint, and integrates a semantic graph-based place recognition method in our loop closure detection module. Benefitting from semantics, we can improve the localization accuracy, detect loop closures effectively, and construct a global consistent semantic map even in large-scale scenes. Extensive experiments on KITTI and Ford Campus dataset show that our system significantly improves baseline performance, has generalization ability to unseen data and achieves competitive results compared with state-of-the-art methods.

【28】 Adapting Stepsizes by Momentumized Gradients Improves Optimization and Generalization 标题:用矩阵化梯度调整步长提高了最优化和泛化能力

作者:Yizhou Wang,Yue Kang,Can Qin,Yi Xu,Huan Wang,Yulun Zhang,Yun Fu 机构:† Northeastern University, ‡ University of California, Davis 备注:40 pages, 27 figures 链接:https://arxiv.org/abs/2106.11514 摘要:自适应梯度方法,如\textsc{Adam},在机器学习中取得了巨大的成功。用梯度的平方来衡量梯度,这种方法可以实现现代深层神经网络的快速训练。然而,它们的泛化能力比随机梯度下降(SGD)差,并且在训练的早期阶段容易陷入局部极小。有趣的是,我们发现将预条件项中的梯度替换为\textsc{Adam}中的动量化版本可以很好地解决这些问题。直觉上,带有动量的梯度包含了更精确的方向信息,因此它的二阶矩估计是比原始梯度更好的缩放选择。因此,我们建议\textsc{AdaMomentum}作为一个新的优化器,以达到更快地训练同时更好地推广的目标。我们进一步发展了一个理论来支持优化和泛化的改进,并在凸和非凸条件下提供收敛性保证。在各种模型和任务上的大量实验表明,在视觉任务上,AdaMomentum与SGD表现出相当的性能,并且在包括语言处理在内的其他任务上也取得了一致的最新成果。 摘要:Adaptive gradient methods, such as \textsc{Adam}, have achieved tremendous success in machine learning. Scaling gradients by square roots of the running averages of squared past gradients, such methods are able to attain rapid training of modern deep neural networks. Nevertheless, they are observed to generalize worse than stochastic gradient descent (\textsc{SGD}) and tend to be trapped in local minima at an early stage during training. Intriguingly, we discover that substituting the gradient in the preconditioner term with the momentumized version in \textsc{Adam} can well solve the issues. The intuition is that gradient with momentum contains more accurate directional information and therefore its second moment estimation is a better choice for scaling than raw gradient's. Thereby we propose \textsc{AdaMomentum} as a new optimizer reaching the goal of training faster while generalizing better. We further develop a theory to back up the improvement in optimization and generalization and provide convergence guarantee under both convex and nonconvex settings. Extensive experiments on various models and tasks demonstrate that \textsc{AdaMomentum} exhibits comparable performance to \textsc{SGD} on vision tasks, and achieves state-of-the-art results consistently on other tasks including language processing.

【29】 Knowing How to Plan 标题:懂得如何规划

作者:Yanjun Li,Yanjing Wang 机构:This work is licensed under the, Creative Commons Attribution License., Knowing How to Plan, College of Philosophy, Nankai University, Tianjin, China, Department of Philosophy, Peking University, Beijing, China 备注:None 链接:https://arxiv.org/abs/2106.11504 摘要:最近的文献中研究了各种基于规划的专有技术逻辑。在本文中,我们使用这样一个逻辑,通过模型检查来进行基于知识的规划。特别地,我们可以以包含诀窍公式的高阶认知规划为目标,例如,找到一个确保p的计划,使得对手不知道如何在将来使p为假。本文给出了有限认知转移系统模型检验问题的一个PTIME算法,并在完全召回假设下对逻辑进行了公理化。 摘要:Various planning-based know-how logics have been studied in the recent literature. In this paper, we use such a logic to do know-how-based planning via model checking. In particular, we can handle the higher-order epistemic planning involving know-how formulas as the goal, e.g., find a plan to make sure p such that the adversary does not know how to make p false in the future. We give a PTIME algorithm for the model checking problem over finite epistemic transition systems and axiomatize the logic under the assumption of perfect recall.

【30】 Game-Theoretic Models of Moral and Other-Regarding Agents (extended abstract) 标题:道德主体和他者主体的博弈论模型(扩展摘要)

作者:Gabriel Istrate 机构:West University of Timi¸soara 备注:None 链接:https://arxiv.org/abs/2106.11503 摘要:我们研究了有限范式对策中的康德均衡,这是最近在经济学文献中提出的一类非纳什的、道德激励的行为过程。我们强调了一些这样的均衡问题,包括计算的难处理性,一个高价格的错误协调,并有问题的扩展到一般正规形式的游戏。我们基于程序均衡的概念给出了这样一个推广,并指出实际相关的推广可能不存在。为了弥补这一点,我们提出了一些一般的,直观的,可计算的,其他有关的平衡,这是康德平衡的特殊情况,以及一类行动的过程,插值之间纯粹的自我和康德行为。 摘要:We investigate Kantian equilibria in finite normal form games, a class of non-Nashian, morally motivated courses of action that was recently proposed in the economics literature. We highlight a number of problems with such equilibria, including computational intractability, a high price of miscoordination, and problematic extension to general normal form games. We give such a generalization based on concept of program equilibria, and point out that that a practically relevant generalization may not exist. To remedy this we propose some general, intuitive, computationally tractable, other-regarding equilibria that are special cases Kantian equilibria, as well as a class of courses of action that interpolates between purely self-regarding and Kantian behavior.

【31】 Knowledge from Probability 标题:来自概率的知识

作者:Jeremy Goodman,Bernhard Salow 机构:School of Philosophy, University of Southern California, USA, University of Oxford, UK 备注:None 链接:https://arxiv.org/abs/2106.11501 摘要:我们给出了归纳知识和信念的概率分析,并探讨了它对未来、自然规律和不精确量值的预测。该分析结合了知识和信念的理论制定的关系比较正常与概率减少这些关系。它预测只有极有可能的命题才被相信,许多被广泛接受的信念修正原则都失败了。 摘要:We give a probabilistic analysis of inductive knowledge and belief and explore its predictions concerning knowledge about the future, about laws of nature, and about the values of inexactly measured quantities. The analysis combines a theory of knowledge and belief formulated in terms of relations of comparative normality with a probabilistic reduction of those relations. It predicts that only highly probable propositions are believed, and that many widely held principles of belief-revision fail.

【32】 Are the Players in an Interactive Belief Model Meta-certain of the Model Itself? 标题:交互信念模型中的参与者是否对模型本身具有元确定性?

作者:Satoshi Fukuda 机构:Department of Decision Sciences and IGIER, Bocconi University, Milan, Italy 备注:None 链接:https://arxiv.org/abs/2106.11500 摘要:在一个交互的信念模型中,参与者对模型本身是否“普遍的元确定”?本文将这种隐含的“共同元确定性”假设形式化。为此,本文将玩家信念的对象从事件扩展到基于潜在状态的函数。然后,本文定义了一个玩家的信念生成图:它与每个状态关联,玩家是否相信该状态下的每个事件。本文将它的含义形式化为:“一个玩家(元)确定她自己的信念生成图”或“玩家(元)确定她自己的信念生成图的轮廓(即模型)。”本文表明:一个玩家(元)确定她自己的信念生成图当且仅当她的信念是内省的。玩家通常(元)确定模型当且仅当,对于某个玩家我在某个州相信的任何事件,玩家我在该州相信该事件是共同的信念。本文接着探讨了博弈论解概念的认知表征是否需要“共同元确定性”假设。研究表明:如果每个参与者都有自己的策略和信念生成图的逻辑性和(元)确定性,那么每个参与者都正确地相信自己的理性。因此,只有对理性的共同信念才能导致在严格支配的行为的反复消除中幸存下来的行为。 摘要:In an interactive belief model, are the players "commonly meta-certain" of the model itself? This paper formalizes such implicit "common meta-certainty" assumption. To that end, the paper expands the objects of players' beliefs from events to functions defined on the underlying states. Then, the paper defines a player's belief-generating map: it associates, with each state, whether a player believes each event at that state. The paper formalizes what it means by: "a player is (meta-)certain of her own belief-generating map" or "the players are (meta-)certain of the profile of belief-generating maps (i.e., the model)." The paper shows: a player is (meta-)certain of her own belief-generating map if and only if her beliefs are introspective. The players are commonly (meta-)certain of the model if and only if, for any event which some player i believes at some state, it is common belief at the state that player i believes the event. This paper then asks whether the "common meta-certainty" assumption is needed for an epistemic characterization of game-theoretic solution concepts. The paper shows: if each player is logical and (meta-)certain of her own strategy and belief-generating map, then each player correctly believes her own rationality. Consequently, common belief in rationality alone leads to actions that survive iterated elimination of strictly dominated actions.

【33】 De Re Updates 标题:重新更新

作者:Michael Cohen,Wen Tang,Yanjing Wang 机构:Department of Philosophy, Stanford University, USA, Peking University, China 备注:None 链接:https://arxiv.org/abs/2106.11497 摘要:在本文中,我们提出了一个轻量级但强大的动态认知逻辑,它不仅捕获了dedicto和de-re知识之间的区别,而且还捕获了dedicto和de-re更新之间的区别。该逻辑是基于动态认知语言的动态化版本,并根据Wang和Seligman(Proc。AiML 2018)。基于新的归约公理,考虑到动态和分配之间的相互作用,我们得到了基于DEL的公告逻辑和事件模型的完全公理化。 摘要:In this paper, we propose a lightweight yet powerful dynamic epistemic logic that captures not only the distinction between de dicto and de re knowledge but also the distinction between de dicto and de re updates. The logic is based on the dynamified version of an epistemic language extended with the assignment operator borrowed from dynamic logic, following the work of Wang and Seligman (Proc. AiML 2018). We obtain complete axiomatizations for the counterparts of public announcement logic and event-model-based DEL based on new reduction axioms taking care of the interactions between dynamics and assignments.

【34】 Collective Argumentation: The Case of Aggregating Support-Relations of Bipolar Argumentation Frameworks 标题:集体论证:两极论证框架的聚合支持-关系案例

作者:Weiwei Chen 备注:None 链接:https://arxiv.org/abs/2106.11496 摘要:在许多涉及论据交换的现实生活情境中,个体可能会对论据之间的哪些支持事实上是合理的,即他们提出不同的支持关系的评估有所不同。面对这样的情况,我们不妨把个人对支持关系的论证意见聚合成一个集体的观点,这个集体的观点是可以接受的。在本文中,我们假设在两极论证框架下,个体配备了一组论据和论据之间的一组攻击,但可能具有不同的支持关系。运用社会选择理论中的方法论,我们分析了在支持关系聚合过程中,聚合规则能够保留两极论证框架的哪些语义特征。 摘要:In many real-life situations that involve exchanges of arguments, individuals may differ on their assessment of which supports between the arguments are in fact justified, i.e., they put forward different support-relations. When confronted with such situations, we may wish to aggregate individuals' argumentation views on support-relations into a collective view, which is acceptable to the group. In this paper, we assume that under bipolar argumentation frameworks, individuals are equipped with a set of arguments and a set of attacks between arguments, but with possibly different support-relations. Using the methodology in social choice theory, we analyze what semantic properties of bipolar argumentation frameworks can be preserved by aggregation rules during the aggregation of support-relations.

【35】 Spatial-Temporal Super-Resolution of Satellite Imagery via Conditional Pixel Synthesis 标题:基于条件像元合成的卫星图像时空超分辨率研究

作者:Yutong He,Dingjie Wang,Nicholas Lai,William Zhang,Chenlin Meng,Marshall Burke,David B. Lobell,Stefano Ermon 机构:Wiiliam Zhang, Stanford University 链接:https://arxiv.org/abs/2106.11485 摘要:高分辨率卫星图像已被证明对广泛的任务有用,包括测量全球人口、当地经济生计和生物多样性等。不幸的是,高分辨率图像收集的频率很低,购买成本也很高,因此很难在时间和空间上有效地扩展这些下游任务。我们提出了一种新的条件像素合成模型,该模型利用丰富、低成本、低分辨率的图像在不可用的时间和地点生成精确的高分辨率图像。我们的研究表明,我们的模型在一个关键的下游任务——物体计数——上达到了照片真实的样本质量,并且优于竞争基线,特别是在地面条件变化迅速的地理位置上。 摘要:High-resolution satellite imagery has proven useful for a broad range of tasks, including measurement of global human population, local economic livelihoods, and biodiversity, among many others. Unfortunately, high-resolution imagery is both infrequently collected and expensive to purchase, making it hard to efficiently and effectively scale these downstream tasks over both time and space. We propose a new conditional pixel synthesis model that uses abundant, low-cost, low-resolution imagery to generate accurate high-resolution imagery at locations and times in which it is unavailable. We show that our model attains photo-realistic sample quality and outperforms competing baselines on a key downstream task -- object counting -- particularly in geographic locations where conditions on the ground are changing rapidly.

【36】 SeqNetVLAD vs PointNetVLAD: Image Sequence vs 3D Point Clouds for Day-Night Place Recognition 标题:SeqNetVLAD与PointNetVLAD:图像序列与三维点云的昼夜位置识别

作者:Sourav Garg,Michael Milford 机构:QUT Centre for Robotics, Queensland University of Technology 备注:Accepted to CVPR 2021 Workshop on 3D Vision and Robotics (3DVR). this https URL 链接:https://arxiv.org/abs/2106.11481 摘要:位置识别是移动机器人定位和导航的关键技术。基于图像或视觉位置识别(VPR)是一个具有挑战性的问题,因为场景外观和相机视点在重新访问位置时会发生显著变化。最近基于“序列表示”的VPR方法与传统的序列分数聚合或基于单个图像的技术相比,显示出了良好的结果。与此同时,随着基于深度学习的点云处理技术的发展,基于三维点云的位置识别也在探索之中。然而,一个关键的问题仍然存在:一个显式的基于三维结构的位置表示总是优于一个隐式的基于RGB图像序列的“空间”表示,它可以内在地学习场景结构。在这个扩展的摘要中,我们试图通过考虑类似的“度量跨度”来表示位置,来比较这两种方法。我们将基于三维点云的方法(PointNetVLAD)与基于图像序列的方法(SeqNet等)进行了比较,并展示了基于图像序列的方法在给定度量范围内接近甚至超过基于点云的方法所达到的性能。这些性能变化可归因于输入传感器的数据丰富性以及移动机器人的数据积累策略的差异。虽然对于这两种不同的模式而言,完美的苹果对苹果的比较可能不可行,但所提出的比较朝着回答与空间表示有关的更深层次问题的方向迈出了一步,这些问题与自主驾驶和增强/虚拟现实等数个应用程序有关。公开的源代码https://github.com/oravus/seqNet. 摘要:Place Recognition is a crucial capability for mobile robot localization and navigation. Image-based or Visual Place Recognition (VPR) is a challenging problem as scene appearance and camera viewpoint can change significantly when places are revisited. Recent VPR methods based on ``sequential representations'' have shown promising results as compared to traditional sequence score aggregation or single image based techniques. In parallel to these endeavors, 3D point clouds based place recognition is also being explored following the advances in deep learning based point cloud processing. However, a key question remains: is an explicit 3D structure based place representation always superior to an implicit ``spatial'' representation based on sequence of RGB images which can inherently learn scene structure. In this extended abstract, we attempt to compare these two types of methods by considering a similar ``metric span'' to represent places. We compare a 3D point cloud based method (PointNetVLAD) with image sequence based methods (SeqNet and others) and showcase that image sequence based techniques approach, and can even surpass, the performance achieved by point cloud based methods for a given metric span. These performance variations can be attributed to differences in data richness of input sensors as well as data accumulation strategies for a mobile robot. While a perfect apple-to-apple comparison may not be feasible for these two different modalities, the presented comparison takes a step in the direction of answering deeper questions regarding spatial representations, relevant to several applications like Autonomous Driving and Augmented/Virtual Reality. Source code available publicly https://github.com/oravus/seqNet.

【37】 A Logical Neural Network Structure With More Direct Mapping From Logical Relations 标题:一种从逻辑关系到更直接映射的逻辑神经网络结构

作者:Gang Wang 机构: Wang is with the School of Computer Science and Technology, TaiyuanUniversity of Technology 链接:https://arxiv.org/abs/2106.11463 摘要:逻辑关系广泛存在于人类活动中。人类利用它们根据各种条件进行判断和决策,这些条件以规则的形式体现出来。作为一种重要的认知智能,它是将逻辑关系正确地表示和存储到计算机系统中,以便进行自动判断和决策的前提,特别是在医学诊断等高风险领域。然而,现有的数字人工神经网络(ANN)模型在图像识别等感知智能方面表现得很好,而在逻辑表示等认知智能方面表现得不好,阻碍了ANN的进一步应用。为了解决这个问题,研究人员尝试设计逻辑ANN模型来表示和存储逻辑关系。虽然这方面的研究已经取得了一些进展,但由于这些逻辑ANN模型的结构仍然不能更直接地与逻辑关系进行映射,导致相应的逻辑关系无法从它们的网络结构中读出,因此目前的工作仍然存在不足。因此,为了用神经网络结构更清晰地表示逻辑关系,并从中读出逻辑关系,本文根据逻辑表示的需要,通过设计新的逻辑神经元和连接,提出了一种新的逻辑神经网络模型。与最近关于逻辑ANN模型的工作相比,这种逻辑ANN模型采用更直接的映射方法,与逻辑关系有更清晰的对应关系,从而可以按照网络结构的连接模式读出逻辑关系。此外,使用较少的神经元。 摘要:Logical relations widely exist in human activities. Human use them for making judgement and decision according to various conditions, which are embodied in the form of \emph{if-then} rules. As an important kind of cognitive intelligence, it is prerequisite of representing and storing logical relations rightly into computer systems so as to make automatic judgement and decision, especially for high-risk domains like medical diagnosis. However, current numeric ANN (Artificial Neural Network) models are good at perceptual intelligence such as image recognition while they are not good at cognitive intelligence such as logical representation, blocking the further application of ANN. To solve it, researchers have tried to design logical ANN models to represent and store logical relations. Although there are some advances in this research area, recent works still have disadvantages because the structures of these logical ANN models still don't map more directly with logical relations which will cause the corresponding logical relations cannot be read out from their network structures. Therefore, in order to represent logical relations more clearly by the neural network structure and to read out logical relations from it, this paper proposes a novel logical ANN model by designing the new logical neurons and links in demand of logical representation. Compared with the recent works on logical ANN models, this logical ANN model has more clear corresponding with logical relations using the more direct mapping method herein, thus logical relations can be read out following the connection patterns of the network structure. Additionally, less neurons are used.

【38】 Querying in the Age of Graph Databases and Knowledge Graphs 标题:图形数据库和知识图时代的查询

作者:Marcelo Arenas,Claudio Gutierrez,Juan F. Sequeda 机构:Universidad Católica & IMFD, DCC, Universidad de Chile & IMFD, data.world, USA 链接:https://arxiv.org/abs/2106.11456 摘要:图形已经成为我们所知道的表示知识的最佳方式。计算机界已经研究并开发了通过数字技术管理图形的支持。图形数据库和知识图表面作为最成功的解决方案,这一计划。本教程将提供这些开发背后的数据管理任务的概念图,特别关注用于图形的数据模型和查询语言。 摘要:Graphs have become the best way we know of representing knowledge. The computing community has investigated and developed the support for managing graphs by means of digital technology. Graph databases and knowledge graphs surface as the most successful solutions to this program. This tutorial will provide a conceptual map of the data management tasks underlying these developments, paying particular attention to data models and query languages for graphs.

【39】 KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers 标题:KaggleDBQA:文本到SQL解析器的现实评估

作者:Chia-Hsuan Lee,Oleksandr Polozov,Matthew Richardson 机构:♢University of Washington, ♠Microsoft Research, Redmond 备注:Published as a conference paper at ACL-IJCNLP 2021 链接:https://arxiv.org/abs/2106.11455 摘要:数据库问答的目标是实现对不同应用领域中的实际关系数据库的自然语言查询。最近,大规模数据集(如Spider和WikiSQL)为文本到SQL解析提供了新的建模技术,提高了对不可见数据库的零快照泛化。在这项工作中,我们研究了仍然阻碍这些技术实际应用的挑战。首先,我们提出了KaggleDBQA,一个新的跨领域的真实Web数据库评估数据集,具有特定领域的数据类型、原始格式和不受限制的问题。第二,我们重新检查在实际环境中应用的文本到SQL解析器的评估任务的选择。最后,我们用数据库文档来扩充我们的领域内评估任务,数据库文档是隐含领域知识的自然来源。我们表明,KaggleDBQA对最先进的零炮解析器提出了挑战,但是更现实的评估设置和对相关数据库文档的创造性使用将它们的准确度提高了13.2%以上,性能提高了一倍。 摘要:The goal of database question answering is to enable natural language querying of real-life relational databases in diverse application domains. Recently, large-scale datasets such as Spider and WikiSQL facilitated novel modeling techniques for text-to-SQL parsing, improving zero-shot generalization to unseen databases. In this work, we examine the challenges that still prevent these techniques from practical deployment. First, we present KaggleDBQA, a new cross-domain evaluation dataset of real Web databases, with domain-specific data types, original formatting, and unrestricted questions. Second, we re-examine the choice of evaluation tasks for text-to-SQL parsers as applied in real-life settings. Finally, we augment our in-domain evaluation task with database documentation, a naturally occurring source of implicit domain knowledge. We show that KaggleDBQA presents a challenge to state-of-the-art zero-shot parsers but a more realistic evaluation setting and creative use of associated database documentation boosts their accuracy by over 13.2%, doubling their performance.

【40】 A Competitive Analysis of Online Multi-Agent Path Finding 标题:在线多智能体寻路的好胜分析

作者:Hang Ma 机构:Simon Fraser University 备注:Published at ICAPS 2021 链接:https://arxiv.org/abs/2106.11454 摘要:我们研究了在线多智能体路径发现(MAPF),其中新的智能体随着时间的推移不断被发现,所有的智能体都必须找到到给定目标位置的无冲突路径。我们将已有的(离线)MAPF的复杂性结果推广到在线MAPF。我们根据(1)可控性(每次规划路径的代理集合)和(2)合理性(规划路径的质量)将在线MAPF算法分为不同的类别,并研究它们之间的关系。我们对每一类在线MAPF算法的常用目标函数进行了竞争性分析。我们证明了一个朴素的算法,路由新发现的代理一次一个序列实现了一个竞争比,从下面和上面渐近有界的数量代理关于流动时间和makespan。然后我们给出了一个反直觉的结果,即如果不允许重新路由先前暴露的代理,任何合理的在线MAPF算法,包括那些为所有新暴露的代理规划最优路径的算法,都具有与naive算法相同的渐近竞争比,即使在2D的4邻域网格上也是如此。我们还导出了允许重路由的任何有理在线MAPF算法的竞争比的常数下界。研究结果首次为在线环境下使用MAPF算法的有效性提供了理论依据。 摘要:We study online Multi-Agent Path Finding (MAPF), where new agents are constantly revealed over time and all agents must find collision-free paths to their given goal locations. We generalize existing complexity results of (offline) MAPF to online MAPF. We classify online MAPF algorithms into different categories based on (1) controllability (the set of agents that they can plan paths for at each time) and (2) rationality (the quality of paths they plan) and study the relationships between them. We perform a competitive analysis for each category of online MAPF algorithms with respect to commonly-used objective functions. We show that a naive algorithm that routes newly-revealed agents one at a time in sequence achieves a competitive ratio that is asymptotically bounded from both below and above by the number of agents with respect to flowtime and makespan. We then show a counter-intuitive result that, if rerouting of previously-revealed agents is not allowed, any rational online MAPF algorithms, including ones that plan optimal paths for all newly-revealed agents, have the same asymptotic competitive ratio as the naive algorithm, even on 2D 4-neighbor grids. We also derive constant lower bounds on the competitive ratio of any rational online MAPF algorithms that allow rerouting. The results thus provide theoretical insights into the effectiveness of using MAPF algorithms in an online setting for the first time.

【41】 Efficient Inference via Universal LSH Kernel 标题:基于通用LSH核的高效推理

作者:Zichang Liu,Benjamin Coleman,Anshumali Shrivastava 机构:Department of Computer Science, Rice University, Houston, TX , Department of Electrical and Computer Engineering 链接:https://arxiv.org/abs/2106.11426 摘要:大型机器学习模型在各种任务上取得了前所未有的性能,并随着go-to技术的发展而不断发展。然而,在资源受限的环境中部署这些计算和内存消耗模型带来了新的挑战。在这项工作中,我们提出了数学上可证明的Representer Sketch,这是一组简洁的计数数组,可以通过简单的散列计算和聚合来近似推理过程。Representer-Sketch建立在核心文献中流行的Representer定理的基础上,因此得名,它提供了一种通用的基本方法来解决有效推理的问题,该问题超越了常用的方法,如量化、迭代剪枝和知识提取。将神经网络函数转化为加权核密度表示,用我们的素描算法可以非常有效地估计出核密度。实验表明,Representer-Sketch在不降低精度的情况下,存储需求减少了114倍,计算复杂度减少了59倍。 摘要:Large machine learning models achieve unprecedented performance on various tasks and have evolved as the go-to technique. However, deploying these compute and memory hungry models on resource constraint environments poses new challenges. In this work, we propose mathematically provable Representer Sketch, a concise set of count arrays that can approximate the inference procedure with simple hashing computations and aggregations. Representer Sketch builds upon the popular Representer Theorem from kernel literature, hence the name, providing a generic fundamental alternative to the problem of efficient inference that goes beyond the popular approach such as quantization, iterative pruning and knowledge distillation. A neural network function is transformed to its weighted kernel density representation, which can be very efficiently estimated with our sketching algorithm. Empirically, we show that Representer Sketch achieves up to 114x reduction in storage requirement and 59x reduction in computation complexity without any drop in accuracy.

【42】 Evaluating Team Skill Aggregation in Online Competitive Games 标题:好胜网络游戏中团队技能聚合的评估

作者:Arman Dehpanah,Muheeb Faizan Ghori,Jonathan Gemmell,Bamshad Mobasher 机构:School of Computing, DePaul University, Chicago, USA 备注:Accepted in IEEE Conference on Games 2021 链接:https://arxiv.org/abs/2106.11397 摘要:在线竞争游戏的主要目标之一是通过确保公平的比赛来提高玩家的参与度。这些游戏使用评级系统来创建平衡的比赛。评级系统利用统计估计来评估玩家的技能,并在匹配玩家之前使用技能评级来预测排名。单个玩家的技能等级可以聚合起来计算团队的技能水平。虽然研究的目的往往是为了提高技能估计的准确性和比赛的公平性,但对于如何根据队员的技能水平来计算球队的技能水平却很少有人关注。在本文中,我们提出了两种新的聚合方法,并将它们与研究文献中广泛使用的标准方法进行了比较。本文详细分析了这些方法对评级系统预测性能的影响。我们使用三个流行的评分系统Elo、Glicko和TrueSkill,在三个真实世界的数据集上进行实验,包括100000多场皇家战役和正面比赛。我们的评估表明,在大多数测试案例中,MAX方法优于其他两种方法,这意味着团队的整体绩效最好由其最熟练成员的绩效决定。这项研究的结果强调了设计更详细的计算团队绩效的方法的必要性——这些方法涵盖了球员行为的不同方面,如技能、策略或目标。 摘要:One of the main goals of online competitive games is increasing player engagement by ensuring fair matches. These games use rating systems for creating balanced match-ups. Rating systems leverage statistical estimation to rate players' skills and use skill ratings to predict rank before matching players. Skill ratings of individual players can be aggregated to compute the skill level of a team. While research often aims to improve the accuracy of skill estimation and fairness of match-ups, less attention has been given to how the skill level of a team is calculated from the skill level of its members. In this paper, we propose two new aggregation methods and compare them with a standard approach extensively used in the research literature. We present an exhaustive analysis of the impact of these methods on the predictive performance of rating systems. We perform our experiments using three popular rating systems, Elo, Glicko, and TrueSkill, on three real-world datasets including over 100,000 battle royale and head-to-head matches. Our evaluations show the superiority of the MAX method over the other two methods in the majority of the tested cases, implying that the overall performance of a team is best determined by the performance of its most skilled member. The results of this study highlight the necessity of devising more elaborated methods for calculating a team's performance -- methods covering different aspects of players' behavior such as skills, strategy, or goals.

【43】 A Turing Test for Transparency 标题:关于透明度的图灵测试

作者:Felix Biessmann,Viktor Treu 机构:Equal contribution 1Beuth University of Applied Sciences 备注:Published in Proceedings of the ICML Workshop on Theoretical Foundations, Criticism, and Application Trends of Explainable AI held in conjunction with the 38th International Conference on Machine Learning (ICML) 链接:https://arxiv.org/abs/2106.11394 摘要:可解释人工智能(XAI)的一个中心目标是改善人工智能交互中的信任关系。透明人工智能系统研究的一个假设是,解释有助于更好地评估机器学习(ML)模型的预测,例如,通过使人类能够更有效地识别错误的预测。然而,最近的经验证据表明,解释可能产生相反的效果:当提出ML预测的解释时,人类往往倾向于相信ML预测,即使这些预测是错误的。实验证据表明,这种效应可以归因于人工智能或解释的直观程度。这种效应挑战了XAI的目标,并意味着透明AI方法的负责任使用必须考虑到人类区分机器生成和人类解释的能力。本文提出了一种基于图灵模拟博弈的XAI方法的量化度量,即图灵透明性检验。询问者被要求判断一个解释是由人还是由XAI方法生成的。在这个二进制分类任务中,人类无法检测到的XAI方法的解释超过了偶然的性能,正在通过测试。检测这样的解释是评估和校准人工智能交互中的信任关系的一个必要条件。我们在一个众包文本分类任务上给出了实验结果,证明即使对于基本的ML模型和XAI方法,大多数参与者也无法区分人类和机器生成的解释。我们讨论了我们的结果对透明ML应用的伦理和实践意义。 摘要:A central goal of explainable artificial intelligence (XAI) is to improve the trust relationship in human-AI interaction. One assumption underlying research in transparent AI systems is that explanations help to better assess predictions of machine learning (ML) models, for instance by enabling humans to identify wrong predictions more efficiently. Recent empirical evidence however shows that explanations can have the opposite effect: When presenting explanations of ML predictions humans often tend to trust ML predictions even when these are wrong. Experimental evidence suggests that this effect can be attributed to how intuitive, or human, an AI or explanation appears. This effect challenges the very goal of XAI and implies that responsible usage of transparent AI methods has to consider the ability of humans to distinguish machine generated from human explanations. Here we propose a quantitative metric for XAI methods based on Turing's imitation game, a Turing Test for Transparency. A human interrogator is asked to judge whether an explanation was generated by a human or by an XAI method. Explanations of XAI methods that can not be detected by humans above chance performance in this binary classification task are passing the test. Detecting such explanations is a requirement for assessing and calibrating the trust relationship in human-AI interaction. We present experimental results on a crowd-sourced text classification task demonstrating that even for basic ML models and XAI approaches most participants were not able to differentiate human from machine generated explanations. We discuss ethical and practical implications of our results for applications of transparent ML.

【44】 Membership Inference on Word Embedding and Beyond 标题:词嵌入和词外的隶属度推理

作者:Saeed Mahloujifar,Huseyin A. Inan,Melissa Chase,Esha Ghosh,Marcello Hasegawa 机构:Princeton University, Microsoft Research, Microsoft Corporation 链接:https://arxiv.org/abs/2106.11384 摘要:在文本处理上下文中,大多数ML模型都建立在单词嵌入的基础上。这些嵌入本身是在一些数据集上训练的,可能包含敏感数据。在某些情况下,这种训练是独立完成的,在另一些情况下,它是作为一个更大的、特定于任务的模型训练的一部分进行的。在这两种情况下,考虑基于嵌入层的成员推理攻击是理解敏感信息泄漏的一种方法。但是,有点令人惊讶的是,对单词嵌入的成员推理攻击以及它们在使用这些嵌入的自然语言处理(NLP)任务中的作用,仍然没有得到充分的研究。在这项工作中,我们证明了在现实假设下,单词嵌入容易受到黑盒成员身份推理攻击。此外,我们还通过另外两个主要的NLP应用程序(分类和文本生成)证明了这种泄漏仍然存在,即使嵌入层没有暴露给攻击者。我们证明了我们的MI攻击对分类器模型和基于LSTM的语言模型具有较高的攻击精度。实际上,我们的攻击是对文本生成模型的一种廉价的成员推断攻击,它不需要目标模型的知识,也不需要对文本生成模型进行任何昂贵的训练。 摘要:In the text processing context, most ML models are built on word embeddings. These embeddings are themselves trained on some datasets, potentially containing sensitive data. In some cases this training is done independently, in other cases, it occurs as part of training a larger, task-specific model. In either case, it is of interest to consider membership inference attacks based on the embedding layer as a way of understanding sensitive information leakage. But, somewhat surprisingly, membership inference attacks on word embeddings and their effect in other natural language processing (NLP) tasks that use these embeddings, have remained relatively unexplored. In this work, we show that word embeddings are vulnerable to black-box membership inference attacks under realistic assumptions. Furthermore, we show that this leakage persists through two other major NLP applications: classification and text-generation, even when the embedding layer is not exposed to the attacker. We show that our MI attack achieves high attack accuracy against a classifier model and an LSTM-based language model. Indeed, our attack is a cheaper membership inference attack on text-generative models, which does not require the knowledge of the target model or any expensive training of text-generative models as shadow models.

【45】 Phrase-level Active Learning for Neural Machine Translation 标题:神经机器翻译中的短语级主动学习

作者:Junjie Hu,Graham Neubig 机构:Language Technologies Institute, Carnegie Mellon University 链接:https://arxiv.org/abs/2106.11375 摘要:神经机器翻译(NMT)对域移位非常敏感。在本文中,我们在一个主动学习环境中解决这个问题,在这个环境中,我们可以花费一定的预算来翻译域内数据,并在新翻译的数据上逐步微调预先训练好的域外NMT模型。现有的NMT主动学习方法通常是基于不确定性分数来选择句子,但是这些方法需要花费大量的代价来翻译完整的句子,即使句子中只有一个或两个关键短语是有用的。为了解决这一局限性,我们重新审视了基于短语的机器翻译(PBMT)时代以前的工作,即选择的不是完整的句子,而是单个短语。然而,虽然将这些短语合并到PBMT系统中相对简单,但对于NMT系统来说却不那么简单,因为NMT系统需要对完整序列进行训练,以捕获新领域特有的句子的更大结构特性。为了克服这些障碍,我们建议在新的域中从未标记的数据中选择完整的句子和单独的短语来路由到人工翻译。在一个德语-英语翻译任务中,我们的主动学习方法比基于不确定性的句子选择方法取得了一致的改进,比强主动学习基线提高了1.2 BLEU分数。 摘要:Neural machine translation (NMT) is sensitive to domain shift. In this paper, we address this problem in an active learning setting where we can spend a given budget on translating in-domain data, and gradually fine-tune a pre-trained out-of-domain NMT model on the newly translated data. Existing active learning methods for NMT usually select sentences based on uncertainty scores, but these methods require costly translation of full sentences even when only one or two key phrases within the sentence are informative. To address this limitation, we re-examine previous work from the phrase-based machine translation (PBMT) era that selected not full sentences, but rather individual phrases. However, while incorporating these phrases into PBMT systems was relatively simple, it is less trivial for NMT systems, which need to be trained on full sequences to capture larger structural properties of sentences unique to the new domain. To overcome these hurdles, we propose to select both full sentences and individual phrases from unlabelled data in the new domain for routing to human translators. In a German-English translation task, our active learning approach achieves consistent improvements over uncertainty-based sentence selection methods, improving up to 1.2 BLEU score over strong active learning baselines.

【46】 Distributed Heuristic Multi-Agent Path Finding with Communication 标题:带通信的分布式启发式多Agent路径搜索

作者:Ziyuan Ma,Yudong Luo,Hang Ma 机构: and [ 18] assumes that 1School of Computing Science, Simon Fraser University 备注:Published at ICRA 2021 链接:https://arxiv.org/abs/2106.11365 摘要:多智能体路径发现(MAPF)是大规模机器人系统的关键。最近的方法已经应用强化学习(RL)来学习部分可观测环境中的分散策略。获得无碰撞策略的一个基本挑战是,代理需要学习协作来处理拥塞情况。本文将通信与深度Q学习相结合,提出了一种基于学习的MAPF方法,其中agent通过图卷积实现协作。为了指导RL算法在面向目标的长时间任务中的应用,我们嵌入了来自单个源的最短路径的潜在选择作为启发式指导,而不是像现有的大多数工作那样使用特定的路径。我们的方法独立地处理每个agent,并从单个agent的角度训练模型。最后将训练好的策略应用于每个代理,以实现分散执行。整个系统在训练过程中是分布式的,在课程学习策略下进行训练。在多障碍环境下的实验结果表明,该方法平均步长小,成功率高。 摘要:Multi-Agent Path Finding (MAPF) is essential to large-scale robotic systems. Recent methods have applied reinforcement learning (RL) to learn decentralized polices in partially observable environments. A fundamental challenge of obtaining collision-free policy is that agents need to learn cooperation to handle congested situations. This paper combines communication with deep Q-learning to provide a novel learning based method for MAPF, where agents achieve cooperation via graph convolution. To guide RL algorithm on long-horizon goal-oriented tasks, we embed the potential choices of shortest paths from single source as heuristic guidance instead of using a specific path as in most existing works. Our method treats each agent independently and trains the model from a single agent's perspective. The final trained policy is applied to each agent for decentralized execution. The whole system is distributed during training and is trained under a curriculum learning strategy. Empirical evaluation in obstacle-rich environment indicates the high success rate with low average step of our method.

【47】 Photozilla: A Large-Scale Photography Dataset and Visual Embedding for 20 Photography Styles 标题:Photozilla:一个大规模摄影数据集和20种摄影样式的视觉嵌入

作者:Trisha Singhal,Junhua Liu,Lucienne T. M. Blessing,Kwan Hui Lim 机构:Lucienne T.M. Blessing, Singapore University of Technology and Design, Singapore, Forth AI, Singapore 备注:In the Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2021. (Poster) 链接:https://arxiv.org/abs/2106.11359 摘要:社交媒体平台的出现促进了数码摄影的发展,从而催生了视觉应用的繁荣。基于这一动机,我们引入了一个称为“Photozilla”的大规模数据集,其中包括超过990k的图像,属于10种不同的摄影风格。然后利用该数据集训练3个分类模型,将图像自动分类为相应的风格,分类准确率达96%。随着数码摄影的迅速发展,我们看到新的摄影风格以指数级的速度出现。基于此,我们提出了一种新的基于暹罗的分类网络,该网络以训练好的分类模型为基础结构,只需25个训练样本就可以适应和分类不可见的样式。我们报告的准确率超过68%的确定10其他不同类型的摄影风格。此数据集位于https://trisha025.github.io/Photozilla/ 摘要:The advent of social media platforms has been a catalyst for the development of digital photography that engendered a boom in vision applications. With this motivation, we introduce a large-scale dataset termed 'Photozilla', which includes over 990k images belonging to 10 different photographic styles. The dataset is then used to train 3 classification models to automatically classify the images into the relevant style which resulted in an accuracy of ~96%. With the rapid evolution of digital photography, we have seen new types of photography styles emerging at an exponential rate. On that account, we present a novel Siamese-based network that uses the trained classification models as the base architecture to adapt and classify unseen styles with only 25 training samples. We report an accuracy of over 68% for identifying 10 other distinct types of photography styles. This dataset can be found at https://trisha025.github.io/Photozilla/

【48】 Cogment: Open Source Framework For Distributed Multi-actor Training, Deployment & Operations 标题:Cogment:面向分布式多参与者训练、部署和操作的开源框架

作者:AI Redefined,Sai Krishna Gottipati,Sagar Kurandwad,Clodéric Mars,Gregory Szriftgiser,François Chabot 机构:Chabot 备注:16 pages, 7 figures 链接:https://arxiv.org/abs/2106.11345 摘要:由于强化学习和人在回路学习方面的一些进步,让人类直接参与人工智能代理的训练正变得越来越重要。人类可以为代理提供奖励、演示任务、设计课程或在环境中行动,但这些好处也伴随着架构、功能设计和工程复杂性而来。我们提出了Cogment,一个统一的开源框架,它引入了actor形式主义来支持各种人-代理协作类型和训练方法。由于采用了分布式微服务体系结构,它还具有开箱即用的可扩展性,并为上述复杂性提供了解决方案。 摘要:Involving humans directly for the benefit of AI agents' training is getting traction thanks to several advances in reinforcement learning and human-in-the-loop learning. Humans can provide rewards to the agent, demonstrate tasks, design a curriculum, or act in the environment, but these benefits also come with architectural, functional design and engineering complexities. We present Cogment, a unifying open-source framework that introduces an actor formalism to support a variety of humans-agents collaboration typologies and training approaches. It is also scalable out of the box thanks to a distributed micro service architecture, and offers solutions to the aforementioned complexities.

【49】 f-Domain-Adversarial Learning: Theory and Algorithms 标题:F-域-对抗性学习:理论与算法

作者:David Acuna,Guojun Zhang,Marc T. Law,Sanja Fidler 机构: The model is trained with both the 1NVIDIA 2UniversityofToronto 3VectorInstitute 4University of Waterloo 备注:ICML 2021 链接:https://arxiv.org/abs/2106.11344 摘要:无监督域自适应在许多机器学习应用中都有应用,在训练过程中,模型可以访问目标域中的未标记数据和相关的标记数据集。本文介绍了一种新颖的通用领域对抗框架。具体来说,我们推导了一个新的推广界域适应,利用一个新的措施之间的差异分布的基础上变分特征的f-分歧。作为特例,它恢复了Ben-David等人(2010a)的理论结果,并支持了实践中使用的分歧。基于这个界限,我们导出了一个新的算法框架,该框架在Ganin等人(2016)的原始对抗训练方法中引入了关键修正。我们表明,过去几年在这个框架中引入的许多正则化器和特殊目标不需要达到与最先进的领域对抗方法相当的性能(如果不是更好的话)。在真实的自然语言和计算机视觉数据集上进行的实验分析表明,我们的框架优于现有的基线,并且对于领域对抗学习中没有考虑的f-分歧获得了最好的结果。 摘要:Unsupervised domain adaptation is used in many machine learning applications where, during training, a model has access to unlabeled data in the target domain, and a related labeled dataset. In this paper, we introduce a novel and general domain-adversarial framework. Specifically, we derive a novel generalization bound for domain adaptation that exploits a new measure of discrepancy between distributions based on a variational characterization of f-divergences. It recovers the theoretical results from Ben-David et al. (2010a) as a special case and supports divergences used in practice. Based on this bound, we derive a new algorithmic framework that introduces a key correction in the original adversarial training method of Ganin et al. (2016). We show that many regularizers and ad-hoc objectives introduced over the last years in this framework are then not required to achieve performance comparable to (if not better than) state-of-the-art domain-adversarial methods. Experimental analysis conducted on real-world natural language and computer vision datasets show that our framework outperforms existing baselines, and obtains the best results for f-divergences that were not considered previously in domain-adversarial learning.

【50】 Dive into Deep Learning 标题:深入研究深度学习

作者:Aston Zhang,Zachary C. Lipton,Mu Li,Alexander J. Smola 机构:Jun 备注:(HTML) this https URL (GitHub) this https URL 链接:https://arxiv.org/abs/2106.11342 摘要:这本开源的书代表了我们试图让深度学习变得平易近人,教读者概念、上下文和代码。整本书是在Jupyter笔记本中起草的,无缝集成了说明图、数学和交互式示例以及自包含的代码。我们的目标是提供一个资源,可以(i)免费提供给每个人(ii)提供足够的技术深度,为实际成为应用机器学习科学家提供起点(iii)包括可运行代码,向读者展示如何在实践中解决问题(iv)允许我们和整个社区快速更新(v) 辅以一个论坛,就技术细节进行互动讨论并回答问题。 摘要:This open-source book represents our attempt to make deep learning approachable, teaching readers the concepts, the context, and the code. The entire book is drafted in Jupyter notebooks, seamlessly integrating exposition figures, math, and interactive examples with self-contained code. Our goal is to offer a resource that could (i) be freely available for everyone; (ii) offer sufficient technical depth to provide a starting point on the path to actually becoming an applied machine learning scientist; (iii) include runnable code, showing readers how to solve problems in practice; (iv) allow for rapid updates, both by us and also by the community at large; (v) be complemented by a forum for interactive discussion of technical details and to answer questions.

【51】 Do sound event representations generalize to other audio tasks? A case study in audio transfer learning 标题:声音事件表示是否适用于其他音频任务?关于音频迁移学习的个案研究

作者:Anurag Kumar,Yun Wang,Vamsi Krishna Ithapu,Christian Fuegen 机构:†Facebook Reality Labs Research, ‡ Facebook Applied AI Research 备注:Accepted Interspeech 2021 链接:https://arxiv.org/abs/2106.11335 摘要:迁移学习是跨多个相关学习问题进行有效信息传递的关键。一种简单而有效的转移学习方法利用在大规模任务中训练的深层神经网络进行特征提取。然后用这种表示来学习相关的下游任务。在这篇论文中,我们研究了在大规模声音事件检测数据集上训练的神经网络所获得的音频表示的转移学习能力。我们通过一个简单的线性分类器传输机制,在广泛的其他音频任务中构建和评估这些表示。我们证明了这种简单的线性传输已经足够强大,可以在下游任务上实现高性能。我们还提供了对声音事件表示的属性的见解,这些属性可以实现这种有效的信息传输。 摘要:Transfer learning is critical for efficient information transfer across multiple related learning problems. A simple, yet effective transfer learning approach utilizes deep neural networks trained on a large-scale task for feature extraction. Such representations are then used to learn related downstream tasks. In this paper, we investigate transfer learning capacity of audio representations obtained from neural networks trained on a large-scale sound event detection dataset. We build and evaluate these representations across a wide range of other audio tasks, via a simple linear classifier transfer mechanism. We show that such simple linear transfer is already powerful enough to achieve high performance on the downstream tasks. We also provide insights into the attributes of sound event representations that enable such efficient information transfer.

【52】 Credal Self-Supervised Learning 标题:凭证式自我监督学习

作者:Julian Lienen,Eyke Hüllermeier 机构:Department of Computer Science, Paderborn University, Paderborn , Germany, Institute of Informatics, University of Munich (LMU), Munich , Germany 备注:17 pages, 1 figure, 7 tables 链接:https://arxiv.org/abs/2106.11853 摘要:自我训练是一种有效的半监督学习方法。其关键思想是让学习者自己根据当前的假设对未标记的实例迭代生成“伪监督”。结合一致性正则化,伪标记在计算机视觉等领域显示出良好的性能。为了说明伪标签的假设性质,通常以概率分布的形式提供。尽管如此,有人可能会说,即使是概率分布也代表了信息量的过高水平,因为它表明学习者准确地知道基本真理条件概率。因此,在我们的方法中,我们允许学习者以credal集的形式标记实例,即(候选)概率分布集。由于这种表达能力的增强,学习者能够以更灵活、更忠实的方式表达不确定性和知识的缺乏。为了从这类弱标记数据中学习,我们利用了最近在所谓的超集学习领域中提出的方法。在详尽的实证评估中,我们将我们的方法与最先进的自我监督方法进行了比较,结果表明,特别是在包含高度不确定性的低标签场景中,我们的方法具有较高的竞争力。 摘要:Self-training is an effective approach to semi-supervised learning. The key idea is to let the learner itself iteratively generate "pseudo-supervision" for unlabeled instances based on its current hypothesis. In combination with consistency regularization, pseudo-labeling has shown promising performance in various domains, for example in computer vision. To account for the hypothetical nature of the pseudo-labels, these are commonly provided in the form of probability distributions. Still, one may argue that even a probability distribution represents an excessive level of informedness, as it suggests that the learner precisely knows the ground-truth conditional probabilities. In our approach, we therefore allow the learner to label instances in the form of credal sets, that is, sets of (candidate) probability distributions. Thanks to this increased expressiveness, the learner is able to represent uncertainty and a lack of knowledge in a more flexible and more faithful manner. To learn from weakly labeled data of that kind, we leverage methods that have recently been proposed in the realm of so-called superset learning. In an exhaustive empirical evaluation, we compare our methodology to state-of-the-art self-supervision approaches, showing competitive to superior performance especially in low-label scenarios incorporating a high degree of uncertainty.

【53】 Algorithmic Recourse in Partially and Fully Confounded Settings Through Bounding Counterfactual Effects 标题:通过限制反事实效应在部分和完全混淆环境下的算法追索权

作者:Julius von Kügelgen,Nikita Agarwal,Jakob Zeitler,Afsaneh Mastouri,Bernhard Schölkopf 机构:Bernhard Sch¨olkopf , Max Planck Institute for Intelligent Systems T¨ubingen, Germany, Department of Engineering, University of Cambridge, United Kingdom, Graduate Training Centre of Neuroscience, International Max Planck Research School 备注:Preliminary workshop version; work in progress 链接:https://arxiv.org/abs/2106.11849 摘要:算法资源旨在为个人提供可操作的建议,以从自动化决策系统中获得更有利的结果。由于它涉及对在物理世界中进行的干预进行推理,追索权从根本上说是一个因果问题。现有的方法使用从数据中学习到的因果模型,在假设没有隐藏的混杂和建模假设(如加性噪声)的情况下,计算追索行为的影响。在Balke和Pearl(1994)的开创性工作的基础上,我们提出了一种离散随机变量的替代方法,该方法放宽了这些假设,并允许未观察到的混杂和任意结构方程。所提出的方法只需要说明因果图和混淆结构,并限制追索行动的预期反事实效果。如果下限高于某个阈值,即在决策边界的另一边,则期望中保证追索权。 摘要:Algorithmic recourse aims to provide actionable recommendations to individuals to obtain a more favourable outcome from an automated decision-making system. As it involves reasoning about interventions performed in the physical world, recourse is fundamentally a causal problem. Existing methods compute the effect of recourse actions using a causal model learnt from data under the assumption of no hidden confounding and modelling assumptions such as additive noise. Building on the seminal work of Balke and Pearl (1994), we propose an alternative approach for discrete random variables which relaxes these assumptions and allows for unobserved confounding and arbitrary structural equations. The proposed approach only requires specification of the causal graph and confounding structure and bounds the expected counterfactual effect of recourse actions. If the lower bound is above a certain threshold, i.e., on the other side of the decision boundary, recourse is guaranteed in expectation.

【54】 Analysis and Tuning of a Voice Assistant System for Dysfluent Speech 标题:一种不流利语音辅助系统的分析与调谐

作者:Vikramjit Mitra,Zifang Huang,Colin Lea,Lauren Tooley,Sarah Wu,Darren Botten,Ashwini Palekar,Shrinath Thelapurath,Panayiotis Georgiou,Sachin Kajarekar,Jefferey Bigham 机构:Apple, Cupertino, CA, USA 备注:5 pages, 1 page reference, 2 figures 链接:https://arxiv.org/abs/2106.11759 摘要:语音发音的不流畅和变异会严重降低语音识别性能,对于许多中重度语音障碍患者来说,语音操作系统不起作用。当前的语音识别系统主要是用流利的说话者的数据来训练的,因此不能很好地推广到语音不流畅的情况,如声音或单词重复、声音延长或听觉障碍。这项工作的重点是定量分析消费者的语音识别系统对个人谁口吃和生产为导向的方法,以提高性能的共同语音助理任务(即,“天气如何?”)。在基线检查时,该系统引入了大量的插入和替换错误,导致流利性障碍患者的预期语音词错误率(isWER)下降13.64%(绝对值)。我们表明,在现有的混合语音识别系统中,只要简单地调整解码参数,就可以将流利性障碍个体的isWER提高24%(相对)。与所有口吃严重程度的18名受试者的默认设置相比,调整这些参数可以提高3.6%的领域识别率和1.7%的意图识别率。 摘要:Dysfluencies and variations in speech pronunciation can severely degrade speech recognition performance, and for many individuals with moderate-to-severe speech disorders, voice operated systems do not work. Current speech recognition systems are trained primarily with data from fluent speakers and as a consequence do not generalize well to speech with dysfluencies such as sound or word repetitions, sound prolongations, or audible blocks. The focus of this work is on quantitative analysis of a consumer speech recognition system on individuals who stutter and production-oriented approaches for improving performance for common voice assistant tasks (i.e., "what is the weather?"). At baseline, this system introduces a significant number of insertion and substitution errors resulting in intended speech Word Error Rates (isWER) that are 13.64\% worse (absolute) for individuals with fluency disorders. We show that by simply tuning the decoding parameters in an existing hybrid speech recognition system one can improve isWER by 24\% (relative) for individuals with fluency disorders. Tuning these parameters translates to 3.6\% better domain recognition and 1.7\% better intent recognition relative to the default setup for the 18 study participants across all stuttering severities.

【55】 Image simulation for space applications with the SurRender software 标题:用CONFIGHT软件进行空间应用的图像仿真

作者:Jérémy Lebreton,Roland Brochard,Matthieu Baudry,Grégory Jonniaux,Adrien Hadj Salah,Keyvan Kanani,Matthieu Le Goff,Aurore Masson,Nicolas Ollagnier,Paolo Panicucci,Amsha Proag,Cyril Robin 机构:Airbus Defence & Space, rue des Cosmonautes, Toulouse 备注:11th International ESA Conference on Guidance, Navigation & Control Systems, 22 - 25 June 2021 16 pages, 8 figures 链接:https://arxiv.org/abs/2106.11322 摘要:基于视觉导航的图像处理算法需要可靠的图像仿真能力。在本文中,我们解释了为什么传统的渲染引擎可能会出现一些限制,这些限制对空间应用可能至关重要。我们将介绍空客投降软件v7,并提供详细的功能,使它成为一个非常强大的空间图像模拟器。我们展示了投降是如何在我们的计算机视觉解决方案的发展过程的核心,我们提供了一系列的插图渲染图像,从月球和太阳系的探索,在轨道交会和行星机器人的各种使用情况。 摘要:Image Processing algorithms for vision-based navigation require reliable image simulation capacities. In this paper we explain why traditional rendering engines may present limitations that are potentially critical for space applications. We introduce Airbus SurRender software v7 and provide details on features that make it a very powerful space image simulator. We show how SurRender is at the heart of the development processes of our computer vision solutions and we provide a series of illustrations of rendered images for various use cases ranging from Moon and Solar System exploration, to in orbit rendezvous and planetary robotics.

【56】 Transformer-based Spatial-Temporal Feature Learning for EEG Decoding 标题:基于变换的时空特征学习在脑电解码中的应用

作者:Yonghao Song,Xueyu Jia,Lie Yang,Longhan Xie 机构: Lie Yang and Longhan Xie are with theShien-Ming Wu School of Intelligent Engineering, South China Universityof Technology 备注:10 pages, 6 figures 链接:https://arxiv.org/abs/2106.11170 摘要:目前,人们通常采用一些基于卷积神经网络(CNNs)的方法对脑电图(EEG)进行解码。然而,CNNs在感知全局依赖性方面存在局限性,这对于具有强整体关系的常见EEG范式是不够的。针对这一问题,本文提出了一种基于注意机制的脑电解码方法。首先对脑电数据进行预处理和空间滤波。然后,在特征通道维度上进行注意转换,使模型能够增强更多相关的空间特征。最关键的一步是在时间维度上对数据进行切片,进行注意力转换,最终得到高度可分辨的表征。在这个时候,全局平均池和一个简单的完全连接层被用来分类不同类别的脑电数据。在两个公共数据集上的实验表明,注意转换策略有效地利用了空间和时间特征。在脑电多分类中,参数较少,达到了最先进的水平。据我们所知,这是第一次提出一个详细而完整的方法,基于Transformer的思想在这一领域。它对促进脑机接口(BCI)的实用化具有良好的潜力。源代码可以在以下位置找到:\textit{https://github.com/anranknight/EEG-Transformer}. 摘要:At present, people usually use some methods based on convolutional neural networks (CNNs) for Electroencephalograph (EEG) decoding. However, CNNs have limitations in perceiving global dependencies, which is not adequate for common EEG paradigms with a strong overall relationship. Regarding this issue, we propose a novel EEG decoding method that mainly relies on the attention mechanism. The EEG data is firstly preprocessed and spatially filtered. And then, we apply attention transforming on the feature-channel dimension so that the model can enhance more relevant spatial features. The most crucial step is to slice the data in the time dimension for attention transforming, and finally obtain a highly distinguishable representation. At this time, global averaging pooling and a simple fully-connected layer are used to classify different categories of EEG data. Experiments on two public datasets indicate that the strategy of attention transforming effectively utilizes spatial and temporal features. And we have reached the level of the state-of-the-art in multi-classification of EEG, with fewer parameters. As far as we know, it is the first time that a detailed and complete method based on the transformer idea has been proposed in this field. It has good potential to promote the practicality of brain-computer interface (BCI). The source code can be found at: \textit{https://github.com/anranknight/EEG-Transformer}.

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2021-06-23,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
语音识别
腾讯云语音识别(Automatic Speech Recognition,ASR)是将语音转化成文字的PaaS产品,为企业提供精准而极具性价比的识别服务。被微信、王者荣耀、腾讯视频等大量业务使用,适用于录音质检、会议实时转写、语音输入法等多个场景。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档