自然语言处理学术速递[7.5]

公众号-arXiv每日学术速递

发布于 2021-07-27 10:23:49

3470

发布于 2021-07-27 10:23:49

cs.CL 方向，今日共计20篇

Transformer(5篇)

【1】 R2D2: Recursive Transformer based on Differentiable Tree for Interpretable Hierarchical Language Modeling 标题：R2D2：用于可解释层次化语言建模的基于可差分树的递归转换器

作者：Xiang Hu,Haitao Mi,Zujie Wen,Yafang Wang,Yi Su,Jing Zheng,Gerard de Melo 机构：Ant Financial Services Group†, Hasso Plattner Institute University of Potsdam‡ 备注：To be published in the proceedings of ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2107.00967 摘要：人类语言理解在多个粒度级别（如单词、短语和句子）上运行，抽象级别不断增加，可以分层组合。然而，现有的具有堆叠层的深层模型并没有显式地建模任何类型的层次过程。本文提出了一种基于可微CKY型二叉树的递归Transformer模型来模拟合成过程。我们将双向语言模型的预训练目标扩展到这个体系结构，尝试在给定左右抽象节点的情况下预测每个单词。为了扩展我们的方法，我们还引入了一种高效的剪枝树归纳算法，使得编码只需线性的合成步骤。语言建模和无监督句法分析的实验结果表明了该方法的有效性。摘要：Human language understanding operates at multiple levels of granularity (e.g., words, phrases, and sentences) with increasing levels of abstraction that can be hierarchically combined. However, existing deep models with stacked layers do not explicitly model any sort of hierarchical process. This paper proposes a recursive Transformer model based on differentiable CKY style binary trees to emulate the composition process. We extend the bidirectional language model pre-training objective to this architecture, attempting to predict each word given its left and right abstraction nodes. To scale up our approach, we also introduce an efficient pruned tree induction algorithm to enable encoding in just a linear number of composition steps. Experimental results on language modeling and unsupervised parsing show the effectiveness of our approach.

【2】 Learned Token Pruning for Transformers 标题：Transformer的学习令牌修剪

作者：Sehoon Kim,Sheng Shen,David Thorsley,Amir Gholami,Joseph Hassoun,Kurt Keutzer 机构：University of California, Berkeley,Samsung Semiconductor, Inc. 链接：https://arxiv.org/abs/2107.00910 摘要：部署Transformer模型的一个主要挑战是其禁止性的推理成本，这与输入序列长度成二次比例。这使得使用转换器处理长序列变得特别困难。为了解决这个问题，我们提出了一种新的学习令牌修剪（LTP）方法，该方法在数据通过Transformer的不同层时减少冗余令牌。特别地，LTP修剪注意分数低于阈值的标记，该阈值是在训练期间学习的。重要的是，我们的基于阈值的方法避免了算法上昂贵的操作，如top-k令牌选择，这是在以前的令牌剪枝方法中使用的，而且还导致结构化剪枝。我们广泛地测试了我们的方法在多个粘合任务上的性能，结果表明，在相同的失败次数下，我们的基于学习阈值的方法比现有的基于top-k令牌的方法的精度高出约2%。此外，我们的初步结果显示，特斯拉T4 GPU和英特尔Haswell CPU的吞吐量分别提高了1.4倍和1.9倍，精度下降不到1%（最多可减少2.1倍的触发器）。我们的代码是在PyTorch中开发的，并且是开源的。摘要：A major challenge in deploying transformer models is their prohibitive inference cost, which quadratically scales with the input sequence length. This makes it especially difficult to use transformers for processing long sequences. To address this, we present a novel Learned Token Pruning (LTP) method that reduces redundant tokens as the data passes through the different layers of the transformer. In particular, LTP prunes tokens with an attention score below a threshold value, which is learned during training. Importantly, our threshold based method avoids algorithmically expensive operations such as top-k token selection which are used in prior token pruning methods, and also leads to structured pruning. We extensively test the performance of our approach on multiple GLUE tasks and show that our learned threshold based method consistently outperforms the prior state-of-the-art top-k token based method by up to ~2% higher accuracy with the same amount of FLOPs. Furthermore, our preliminary results show up to 1.4x and 1.9x throughput improvement on Tesla T4 GPU and Intel Haswell CPU, respectively, with less than 1% of accuracy drop (and up to 2.1x FLOPs reduction). Our code has been developed in PyTorch and has been open-sourced.

【3】 Target-dependent UNITER: A Transformer-Based Multimodal Language Comprehension Model for Domestic Service Robots 标题：目标依赖单元器：一种基于Transformer的家政服务机器人多模态语言理解模型

作者：Shintaro Ishikawa,Komei Sugiura 机构：we propose a Transformer [ 3]-based methodthat learns the relationship between the target object and 1AuthorsarewithKeioUniversity 备注：Accepted for presentation at IROS2021 链接：https://arxiv.org/abs/2107.00811 摘要：目前，家政服务机器人通过语言进行自然交互的能力不足。这是因为理解人类的指令是复杂的各种含糊不清和信息缺失。在现有的方法中，指定对象之间关系的引用表达式没有得到充分的建模。本文提出了目标相关单位，它通过聚焦于图像中的相关区域，而不是整个图像，来直接学习目标与其他目标之间的关系。我们的方法是基于单元的Transformer的扩展，可以在通用数据集上进行预训练。我们通过引入一个新的体系结构来处理候选目标，从而扩展了UNITER方法。我们的模型在两个标准数据集上进行了验证，结果表明，目标相关单位在分类精度上优于基线方法。摘要：Currently, domestic service robots have an insufficient ability to interact naturally through language. This is because understanding human instructions is complicated by various ambiguities and missing information. In existing methods, the referring expressions that specify the relationships between objects are insufficiently modeled. In this paper, we propose Target-dependent UNITER, which learns the relationship between the target object and other objects directly by focusing on the relevant regions within an image, rather than the whole image. Our method is an extension of the UNITER-based Transformer that can be pretrained on general-purpose datasets. We extend the UNITER approach by introducing a new architecture for handling the target candidates. Our model is validated on two standard datasets, and the results show that Target-dependent UNITER outperforms the baseline method in terms of classification accuracy.

【4】 Case Relation Transformer: A Crossmodal Language Generation Model for Fetching Instructions 标题：格关系转换器：一种用于取指令的跨模态语言生成模型

作者：Motonari Kambara,Komei Sugiura 机构：1The authors are with Keio University 备注：Accepted for presentation at IROS2021 链接：https://arxiv.org/abs/2107.00789 摘要：为了提高家政服务机器人的通信能力，机器人学已经有了许多研究。然而，由于训练数据集不够大，大多数研究并没有充分受益于深层神经网络的最新进展。在本文中，我们的目标是在跨模态语言生成模型的基础上扩充数据集。我们提出了一种Case-relationship-Transformer（CRT），它从一幅图像中生成一个提取指令语句，例如“将蓝色触发器移到左下角方框”。与现有的方法不同，CRT使用这种转换器来集成图像中物体的视觉特征和几何特征。由于Case关系块，CRT可以处理对象。我们进行了对比实验和人体评估。实验结果表明，CRT方法优于基线方法。摘要：There have been many studies in robotics to improve the communication skills of domestic service robots. Most studies, however, have not fully benefited from recent advances in deep neural networks because the training datasets are not large enough. In this paper, our aim is to augment the datasets based on a crossmodal language generation model. We propose the Case Relation Transformer (CRT), which generates a fetching instruction sentence from an image, such as "Move the blue flip-flop to the lower left box." Unlike existing methods, the CRT uses the Transformer to integrate the visual features and geometry features of objects in the image. The CRT can handle the objects because of the Case Relation Block. We conducted comparison experiments and a human evaluation. The experimental results show the CRT outperforms baseline methods.

【5】 Transformer-F: A Transformer network with effective methods for learning universal sentence representation 标题：Transform-F：一种学习通用句子表示的有效方法的Transformer网络

作者：Yu Shi 机构：School of Software Engineering, Beijing University of Posts and Telecommunications, Beijing, China, Key Laboratory of Trustworthy Distributed Computing and Service, Beijing University of Posts and 链接：https://arxiv.org/abs/2107.00653 摘要：Transformer模型在自然语言处理中被广泛应用于句子表示。然而，以往基于Transformer的模型大多关注意义有限的虚词，只能提取高层语义抽象特征。本文介绍了提高Transformer性能的两种方法。通过将词性权重向量与相关系数相乘，计算出注意得分，从而提取出更具实际意义的词语。根据词性的重要程度，通过输入文本序列得到权值向量。此外，我们融合了每一层的特征，使得句子表达结果更加全面和准确。在实验中，我们证明了我们的模型Transformer-F在三个标准文本分类数据集上的有效性。实验结果表明，与基线模型相比，本文提出的模型显著提高了文本分类的性能。具体来说，在简单的任务上，我们比vanilla Transformer获得了5.28%的相对改进。摘要：The Transformer model is widely used in natural language processing for sentence representation. However, the previous Transformer-based models focus on function words that have limited meaning in most cases and could merely extract high-level semantic abstraction features. In this paper, two approaches are introduced to improve the performance of Transformers. We calculated the attention score by multiplying the part-of-speech weight vector with the correlation coefficient, which helps extract the words with more practical meaning. The weight vector is obtained by the input text sequence based on the importance of the part-of-speech. Furthermore, we fuse the features of each layer to make the sentence representation results more comprehensive and accurate. In experiments, we demonstrate the effectiveness of our model Transformer-F on three standard text classification datasets. Experimental results show that our proposed model significantly boosts the performance of text classification as compared to the baseline model. Specifically, we obtain a 5.28% relative improvement over the vanilla Transformer on the simple tasks.

BERT(2篇)

【1】 Language Identification of Hindi-English tweets using code-mixed BERT 标题：基于混码BERT的印英推文语言识别

作者：Mohd Zeeshan Ansari,M M Sufyan Beg,Tanvir Ahmad,Mohd Jazib Khan,Ghazali Wasim 机构：Department of Computer Engineering, Jamia Millia Islamia, New Delhi, India, Aligarh Muslim University, Aligarh, India 链接：https://arxiv.org/abs/2107.01202 摘要：社交媒体文本的语言识别是近年来研究的热点问题。在非英语国家，社交媒体信息主要是代码混合的。通过预先训练上下文嵌入的先验知识已经显示了一系列下游任务的最新结果。最近，BERT等模型表明，使用大量的未标记数据，预训练语言模型对学习公共语言表示更为有利。利用迁移学习和微调BERT模型识别Twitter上的语言进行了大量实验。这项工作利用印地语-英语-乌尔都语代码混合文本的数据收集进行语言预训练，并使用印地语-英语代码混合进行后续的单词级语言分类。结果表明，在编码混合数据上预训练的表征与单语对应的表征相比，能产生更好的结果。摘要：Language identification of social media text has been an interesting problem of study in recent years. Social media messages are predominantly in code mixed in non-English speaking states. Prior knowledge by pre-training contextual embeddings have shown state of the art results for a range of downstream tasks. Recently, models such as BERT have shown that using a large amount of unlabeled data, the pretrained language models are even more beneficial for learning common language representations. Extensive experiments exploiting transfer learning and fine-tuning BERT models to identify language on Twitter are presented in this paper. The work utilizes a data collection of Hindi-English-Urdu codemixed text for language pre-training and Hindi-English codemixed for subsequent word-level language classification. The results show that the representations pre-trained over codemixed data produce better results by their monolingual counterpart.

【2】 He Thinks He Knows Better than the Doctors: BERT for Event Factuality Fails on Pragmatics 标题：他认为自己比医生更清楚：BERT的事件真实性在语用学上不及格

作者：Nanjiang Jiang,Marie-Catherine de Marneffe 机构：Department of Linguistics, The Ohio State University 备注：to be published in TACL, pre-MIT Press publication version 链接：https://arxiv.org/abs/2107.00807 摘要：我们调查了BERT在几个现有的英语数据集（包括各种语言结构）中对真实性的预测能力。尽管BERT在大多数数据集上都有很强的性能，但它是通过利用与某些事实性标签相关的共同表面模式来实现的，并且在需要语用推理的情况下它失败了。与高性能所显示的相反，我们还远远没有一个可靠的系统来进行真实性预测。摘要：We investigate how well BERT performs on predicting factuality in several existing English datasets, encompassing various linguistic constructions. Although BERT obtains a strong performance on most datasets, it does so by exploiting common surface patterns that correlate with certain factuality labels, and it fails on instances where pragmatic reasoning is necessary. Contrary to what the high performance suggests, we are still far from having a robust system for factuality prediction.

Graph|知识图谱|Knowledge(1篇)

【1】 Heterogeneous Graph Attention Network for Multi-hop Machine Reading Comprehension 标题：面向多跳机器阅读理解的异构图注意网络

作者：Feng Gao,Jian-Cheng Ni,Peng Gao,Zi-Li Zhou,Yan-Yan Li,Hamido Fujita 机构：Member, IEEE 链接：https://arxiv.org/abs/2107.00841 摘要：多跳机器阅读理解在自然语言处理中是一项具有挑战性的任务，它需要更多的推理能力和解释能力。基于图卷积网络的谱模型具有很强的推理能力，可以产生竞争性的结果，但其中的一部分仍然面临着以人类可理解的方式分析推理的挑战。受认知神经科学中外祖母细胞概念的启发，提出了一个空间图形注意框架crname。该模型将多角度表达的语义特征组合起来，自动地集中或减少推理所需的信息。“CRNEX”的名称是对模型模式的隐喻：将查询的主题作为线索的起点，以推理实体为桥点，并将潜在候选实体视为祖母细胞，线索最终出现在候选实体中。该模型可以直观地显示推理图，分析连接两个实体的边的重要性以及提及节点和候选节点的选择性，便于经验理解。在开放域多跳阅读数据集WikiHop和药物-药物相互作用数据集MedHop上的官方评估结果证明了该方法的有效性，并显示了该模型在分子生物学领域应用的可能性。摘要：Multi-hop machine reading comprehension is a challenging task in natural language processing, which requires more reasoning ability and explainability. Spectral models based on graph convolutional networks grant the inferring abilities and lead to competitive results, however, part of them still face the challenge of analyzing the reasoning in a human-understandable way. Inspired by the concept of the Grandmother Cells in cognitive neuroscience, a spatial graph attention framework named crname, imitating the procedure was proposed. This model is designed to assemble the semantic features in multi-angle representations and automatically concentrate or alleviate the information for reasoning. The name "crname" is a metaphor for the pattern of the model: regard the subjects of queries as the start points of clues, take the reasoning entities as bridge points, and consider the latent candidate entities as the grandmother cells, and the clues end up in candidate entities. The proposed model allows us to visualize the reasoning graph and analyze the importance of edges connecting two entities and the selectivity in the mention and candidate nodes, which can be easier to be comprehended empirically. The official evaluations in open-domain multi-hop reading dataset WikiHop and Drug-drug Interactions dataset MedHop prove the validity of our approach and show the probability of the application of the model in the molecular biology domain.

推理|分析|理解|解释(1篇)

【1】 DRIFT: A Toolkit for Diachronic Analysis of Scientific Literature 标题：漂移：科学文献历时分析工具包

作者：Abheesht Sharma,Gunjan Chhablani,Harshit Pandey,Rajaswa Patil 机构：Dept. of CS&IS, BITS Pilani, Goa Campus, Dept. of Computer Science, Pune University, Dept. of E & E Engineering 备注：6 pages, 5 figures, Submitted to EMNLP-2021 Demo Track 链接：https://arxiv.org/abs/2107.01198 摘要：在这项工作中，我们提出了NLP社区，以及更广泛的研究社区作为一个整体，研究语料库的历时分析的应用。我们开源了一个易于使用的工具：DRIFT，它允许研究人员跟踪多年来的研究趋势和发展。这些分析方法是从被广泛引用的研究著作中整理出来的，并加入了一些我们自己的方法。简而言之，一些分析方法是：关键词提取、词云、利用生产率预测下降/停滞/增长趋势、利用加速图跟踪双图、发现词语的语义漂移、利用相似度跟踪趋势等，以证明我们工具的实用性和有效性，我们对arXiv数据库的cs.CL语料库进行了案例研究，并从分析方法中得出结论。工具箱和相关代码可在以下位置获得：https://github.com/rajaswa/DRIFT. 摘要：In this work, we present to the NLP community, and to the wider research community as a whole, an application for the diachronic analysis of research corpora. We open source an easy-to-use tool coined: DRIFT, which allows researchers to track research trends and development over the years. The analysis methods are collated from well-cited research works, with a few of our own methods added for good measure. Succinctly put, some of the analysis methods are: keyword extraction, word clouds, predicting declining/stagnant/growing trends using Productivity, tracking bi-grams using Acceleration plots, finding the Semantic Drift of words, tracking trends using similarity, etc. To demonstrate the utility and efficacy of our tool, we perform a case study on the cs.CL corpus of the arXiv repository and draw inferences from the analysis methods. The toolkit and the associated code are available here: https://github.com/rajaswa/DRIFT.

半/弱/无监督|不确定性(1篇)

【1】 Unsupervised Spoken Utterance Classification 标题：无监督口语分类

作者：Shahab Jalalvand,Srinivas Bangalore 机构：Senior Research Scientist, Interactions Corp., Murray Hill, NJ, USA, Director AI Research 备注：4 pages 链接：https://arxiv.org/abs/2107.01068 摘要：智能虚拟助理（IVA）通过一种特殊的口语理解形式&口语分类（SUC）实现呼叫路由中的轻松会话。建立一个SUC系统需要大量的有监督的域内数据，而这些数据并不总是可用的。本文介绍了一种无监督的口语话语分类方法（USUC），该方法不需要任何域内数据，只需要对每个意图添加意图标签和几个副词。USUC由一个KNN分类器（K=1）和一个在大量无监督客户服务语料库上训练的复杂嵌入模型组成。在所有嵌入模型中，我们证明了Elmo最适合于USUC。然而，Elmo模型太慢，不能在运行时用于呼叫路由。为了解决这个问题，我们首先离线计算单图和双图的嵌入向量，并建立n图及其对应的嵌入向量的查找表。然后我们使用这个表来计算运行时的语句嵌入向量，以及不可见n-gram的回退技术。实验结果表明，在不需要监督数据的情况下，USUC的分类错误率从32.9%降低到27.0%，优于传统的话语分类方法。此外，我们的查找和回退技术将处理速度从每秒16个语句提高到每秒118个语句。摘要：An intelligent virtual assistant (IVA) enables effortless conversations in call routing through spoken utterance classification (SUC) which is a special form of spoken language understanding (SLU). Building a SUC system requires a large amount of supervised in-domain data that is not always available. In this paper, we introduce an unsupervised spoken utterance classification approach (USUC) that does not require any in-domain data except for the intent labels and a few para-phrases per intent. USUC is consisting of a KNN classifier (K=1) and a complex embedding model trained on a large amount of unsupervised customer service corpus. Among all embedding models, we demonstrate that Elmo works best for USUC. However, an Elmo model is too slow to be used at run-time for call routing. To resolve this issue, first, we compute the uni- and bi-gram embedding vectors offline and we build a lookup table of n-grams and their corresponding embedding vector. Then we use this table to compute sentence embedding vectors at run-time, along with back-off techniques for unseen n-grams. Experiments show that USUC outperforms the traditional utterance classification methods by reducing the classification error rate from 32.9% to 27.0% without requiring supervised data. Moreover, our lookup and back-off technique increases the processing speed from 16 utterances per second to 118 utterances per second.

检测相关(1篇)

【1】 Misinformation Detection on YouTube Using Video Captions 标题：利用视频字幕检测YouTube上的错误信息

作者：Raj Jagtap,Abhinav Kumar,Rahul Goel,Shakshi Sharma,Rajesh Sharma,Clint P. George 机构：School of Mathematics and Computer Science, Indian Institute of Technology Goa, India, Institute of Computer Science, University of Tartu, Tartu, Estonia 链接：https://arxiv.org/abs/2107.00941 摘要：数百万人使用YouTube、Facebook、Twitter和其他大众媒体等平台。由于这些平台的可访问性，它们经常被用来建立叙述、进行宣传和传播错误信息。本文提出了一种利用最新的NLP技术从视频字幕中提取特征的方法。为了评估我们的方法，我们利用一个公开访问和标记的数据集来分类视频是否是错误信息。探索视频字幕背后的动机源于我们对视频元数据的分析。视图、喜欢、不喜欢和评论的数量等属性是无效的，因为使用这些信息很难区分视频。利用字幕数据集，该模型可以将视频分为三类（错误信息、揭穿错误信息和中性），F1评分为0.85～0.90f1。为了强调错误信息类的相关性，我们将我们的分类问题重新表述为两类分类-错误信息与其他分类（揭穿错误信息和中立）。在我们的实验中，我们提出的模型可以对F1评分为0.92到0.95，AUC ROC为0.78到0.90的视频进行分类。摘要：Millions of people use platforms such as YouTube, Facebook, Twitter, and other mass media. Due to the accessibility of these platforms, they are often used to establish a narrative, conduct propaganda, and disseminate misinformation. This work proposes an approach that uses state-of-the-art NLP techniques to extract features from video captions (subtitles). To evaluate our approach, we utilize a publicly accessible and labeled dataset for classifying videos as misinformation or not. The motivation behind exploring video captions stems from our analysis of videos metadata. Attributes such as the number of views, likes, dislikes, and comments are ineffective as videos are hard to differentiate using this information. Using caption dataset, the proposed models can classify videos among three classes (Misinformation, Debunking Misinformation, and Neutral) with 0.85 to 0.90 F1-score. To emphasize the relevance of the misinformation class, we re-formulate our classification problem as a two-class classification - Misinformation vs. others (Debunking Misinformation and Neutral). In our experiments, the proposed models can classify videos with 0.92 to 0.95 F1-score and 0.78 to 0.90 AUC ROC.

识别/分类(3篇)

【1】 Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons 标题：直接和间接相关提及群体的概念识别

作者：Anastasia Zhukova,Felix Hamborg,Karsten Donnay,Bela Gipp 机构：University of Wuppertal, Germany 备注：None 链接：https://arxiv.org/abs/2107.00955 摘要：通过聚类进行无监督的概念识别，即识别语义相关的单词和短语，是识别各种用例中使用的上下文原语的常用方法，例如文本降维，即用概念替换单词以减少词汇量，摘要，和命名实体解析。我们展示了从一组相关文章中提取的一种无监督方法的初步结果，该方法用于识别作为参与者的人群。具体而言，该方法在文本中提到了作为非指名实体行为者的群体，例如，“移民家庭”=“寻求庇护者”。与我们的基线相比，该方法将地缘政治实体的提及保持分离，例如，“伊朗领导人”！=“2”“欧洲领导人”和集群（in）用不同的措辞直接相关的提及，例如，“美国官员”=“特朗普政府” 摘要：Unsupervised concept identification through clustering, i.e., identification of semantically related words and phrases, is a common approach to identify contextual primitives employed in various use cases, e.g., text dimension reduction, i.e., replace words with the concepts to reduce the vocabulary size, summarization, and named entity resolution. We demonstrate the first results of an unsupervised approach for the identification of groups of persons as actors extracted from a set of related articles. Specifically, the approach clusters mentions of groups of persons that act as non-named entity actors in the texts, e.g., "migrant families" = "asylum-seekers." Compared to our baseline, the approach keeps the mentions of the geopolitical entities separated, e.g., "Iran leaders" != "European leaders," and clusters (in)directly related mentions with diverse wording, e.g., "American officials" = "Trump Administration."

【2】 Normalizing Flow based Hidden Markov Models for Classification of Speech Phones with Explainability 标题：基于归一化流的隐马尔可夫模型用于可解释语音分类

作者：Anubhab Ghosh,Antoine Honoré,Dong Liu,Gustav Eje Henter,Saikat Chatterjee 机构：Digital Futures, and School of Electrical Engg. and Computer Sc., KTH Royal Institute of Technology, Sweden 备注：12 pages, 4 figures 链接：https://arxiv.org/abs/2107.00730 摘要：为了追求可解释性，我们开发了序列数据的生成模型。该模型为语音电话分类提供了最先进的分类结果和鲁棒性。我们结合了现代神经网络（规范化流）和传统的生成模型（隐马尔可夫模型-HMMs）。基于归一化流的混合模型（NMMs）被用来模拟给定隐状态的条件概率分布。模型参数的学习是通过时间测试贝叶斯学习方法和现代神经网络学习方法的明智组合。我们主要将期望最大化和小批量梯度下降相结合。所提出的生成模型可以计算数据的似然，因此直接适用于最大似然（ML）分类方法。由于HMMs的结构灵活性，我们可以使用不同的归一化流模型。这导致不同类型的hmm在数据建模能力方面提供多样性。这种多样性为不同模型的决策融合提供了方便。对于一个包含39个电话（类）和TIMIT数据集的标准语音电话分类系统，我们证明了使用mel频率倒谱系数（MFCCs）、提出的生成模型和决策融合的标准特征，仅通过生成训练就可以达到86.6\%$的准确率。这一结果接近最新的结果，例如，PyTorch-Kaldi工具箱[1]的准确率为86.2\%$，使用光选通循环单元[2]的准确率为85.1\%$。在本文中，我们不使用任何有区别的学习方法和相关的复杂特性。摘要：In pursuit of explainability, we develop generative models for sequential data. The proposed models provide state-of-the-art classification results and robust performance for speech phone classification. We combine modern neural networks (normalizing flows) and traditional generative models (hidden Markov models - HMMs). Normalizing flow-based mixture models (NMMs) are used to model the conditional probability distribution given the hidden state in the HMMs. Model parameters are learned through judicious combinations of time-tested Bayesian learning methods and contemporary neural network learning methods. We mainly combine expectation-maximization (EM) and mini-batch gradient descent. The proposed generative models can compute likelihood of a data and hence directly suitable for maximum-likelihood (ML) classification approach. Due to structural flexibility of HMMs, we can use different normalizing flow models. This leads to different types of HMMs providing diversity in data modeling capacity. The diversity provides an opportunity for easy decision fusion from different models. For a standard speech phone classification setup involving 39 phones (classes) and the TIMIT dataset, we show that the use of standard features called mel-frequency-cepstral-coeffcients (MFCCs), the proposed generative models, and the decision fusion together can achieve $86.6\%$ accuracy by generative training only. This result is close to state-of-the-art results, for examples, $86.2\%$ accuracy of PyTorch-Kaldi toolkit [1], and $85.1\%$ accuracy using light gated recurrent units [2]. We do not use any discriminative learning approach and related sophisticated features in this article.

【3】 Interactive decoding of words from visual speech recognition models 标题：视觉语音识别模型中单词的交互式解码

作者：Brendan Shillingford,Yannis Assael,Misha Denil 机构：DeepMind 备注：8 pages 链接：https://arxiv.org/abs/2107.00692 摘要：这项工作描述了一种交互式解码方法，以提高视觉语音识别系统的性能，使用用户输入来补偿任务固有的模糊性。与大多数音素到单词的解码管道不同，我们的方法产生音素并通过一个有限的状态转换器传送这些音素，相反，我们的方法以锁步方式扩展单词，便于在每个单词位置插入交互点。交互点使我们能够在解码过程中请求输入，允许用户以交互方式指导解码过程。我们使用oracle来模拟用户输入的行为，给出了一个自动的评估，并展示了将这种方法用于文本输入的前景。摘要：This work describes an interactive decoding method to improve the performance of visual speech recognition systems using user input to compensate for the inherent ambiguity of the task. Unlike most phoneme-to-word decoding pipelines, which produce phonemes and feed these through a finite state transducer, our method instead expands words in lockstep, facilitating the insertion of interaction points at each word position. Interaction points enable us to solicit input during decoding, allowing users to interactively direct the decoding process. We simulate the behavior of user input using an oracle to give an automated evaluation, and show promise for the use of this method for text input.

Zero/Few/One-Shot|迁移|自适应(1篇)

【1】 Data Centric Domain Adaptation for Historical Text with OCR Errors 标题：具有OCR错误的历史文本的以数据为中心的领域适配

作者：Luisa März,Stefan Schweter,Nina Poerner,Benjamin Roth,Hinrich Schütze 机构：Hinrich Sch¨utze, Center for Information and Language Processing, Ludwig Maximilian University, Munich, Germany 备注：14 pages, 2 figures, 6 tables 链接：https://arxiv.org/abs/2107.00927 摘要：针对荷兰语和法语的历史数据，提出了域内和跨域命名实体识别的新方法。对于跨域的情况，我们通过上下文化的字符串嵌入来整合无监督的域内数据来解决域转移问题；通过将合成OCR错误注入源域并解决以数据为中心的域适配，实现OCR错误。我们提出了一种通用的方法来模拟任意输入数据中的OCR错误。我们的跨域和域内结果优于几个强大的基线，并建立了最先进的结果。我们出版了法语和荷兰语Europeana NER语料库的预处理版本。摘要：We propose new methods for in-domain and cross-domain Named Entity Recognition (NER) on historical data for Dutch and French. For the cross-domain case, we address domain shift by integrating unsupervised in-domain data via contextualized string embeddings; and OCR errors by injecting synthetic OCR errors into the source domain and address data centric domain adaptation. We propose a general approach to imitate OCR errors in arbitrary input data. Our cross-domain as well as our in-domain results outperform several strong baselines and establish state-of-the-art results. We publish preprocessed versions of the French and Dutch Europeana NER corpora.

语料库(1篇)

【1】 DUKweb: Diachronic word representations from the UK Web Archive corpus 标题：DUKweb：来自英国网络档案馆语料库的历时词汇表征

作者：Adam Tsakalidis,Pierpaolo Basile,Marya Bazzi,Mihai Cucuringu,Barbara McGillivray 机构：. The Alan Turing Institute, London, United Kingdom ,. Queen Mary, University of London, London, UK ,. University of Bari, Bari, Italy ,. Univer-, sity of Warwick, Coventry, United Kingdom ,. University of Oxford, Oxford 备注：24 pages, 6 figures 链接：https://arxiv.org/abs/2107.01076 摘要：词汇语义变化是社会文化研究和自然语言处理应用的一项重要任务。历时性词语嵌入（词语的时间敏感向量表示，保留其含义）已成为这项任务的标准资源。然而，考虑到生成这些词所需的大量计算资源，很少有资源可供科学界使用。在本文中，我们介绍了DUKweb，一套大规模的资源设计的历时分析当代英语。DUKweb是从JISC-UK Web域数据集（1996-2013）创建的，这是一个非常大的归档文件，它从以“.UK”结尾的域托管的互联网归档文件中收集资源。DUKweb由一系列单词共现矩阵和两种类型的单词嵌入组成，每年嵌入JISC-UK Web域数据集。我们通过一个词义变化检测的案例来展示DUKweb的重用潜力及其质量标准。摘要：Lexical semantic change (detecting shifts in the meaning and usage of words) is an important task for social and cultural studies as well as for Natural Language Processing applications. Diachronic word embeddings (time-sensitive vector representations of words that preserve their meaning) have become the standard resource for this task. However, given the significant computational resources needed for their generation, very few resources exist that make diachronic word embeddings available to the scientific community. In this paper we present DUKweb, a set of large-scale resources designed for the diachronic analysis of contemporary English. DUKweb was created from the JISC UK Web Domain Dataset (1996-2013), a very large archive which collects resources from the Internet Archive that were hosted on domains ending in `.uk'. DUKweb consists of a series word co-occurrence matrices and two types of word embeddings for each year in the JISC UK Web Domain dataset. We show the reuse potential of DUKweb and its quality standards via a case study on word meaning change detection.

其他神经网络|深度学习|模型|建模(2篇)

【1】 SocialAI: Benchmarking Socio-Cognitive Abilities in Deep Reinforcement Learning Agents 标题：社会人工智能：深度强化学习主体的社会认知能力基准

作者：Grgur Kovač,Rémy Portelas,Katja Hofmann,Pierre-Yves Oudeyer 机构：Inria (FR), Microsoft Research (UK) 备注：under review. arXiv admin note: substantial text overlap with arXiv:2104.13207 链接：https://arxiv.org/abs/2107.00956 摘要：人工智能的主要挑战之一是构建能够参与与人类社会互动的具身自主主体。在深度强化学习（DRL）领域，这一目标激发了许多关于具体语言使用的研究。然而，目前的研究侧重于语言作为一种交际工具在非常简单和非多样化的社会环境中：语言的“自然性”被简化为词汇量大和变异性大的概念。在这篇论文中，我们认为，面向人类水平的人工智能需要更广泛的关键社会技能：1）在复杂多变的社会环境中使用语言；2）在不断发展的社会世界中，在语言之外，复杂的多模态环境中体现了交流。我们解释了认知科学的概念如何帮助人工智能绘制出一个类似人类智能的路线图，并将重点放在它的社会维度上。作为第一步，我们建议将当前的研究扩展到更广泛的核心社交技能。为了做到这一点，我们提出了SocialAI，这是一个评估DRL代理使用多个网格世界环境（具有其他（脚本化）社会代理）获取社会技能的基准。然后，我们研究了最近在SocialAI上测试的SOTA DRL方法的局限性，并讨论了迈向熟练社会代理的重要下一步。视频和代码可在https://sites.google.com/view/socialai. 摘要：Building embodied autonomous agents capable of participating in social interactions with humans is one of the main challenges in AI. Within the Deep Reinforcement Learning (DRL) field, this objective motivated multiple works on embodied language use. However, current approaches focus on language as a communication tool in very simplified and non-diverse social situations: the "naturalness" of language is reduced to the concept of high vocabulary size and variability. In this paper, we argue that aiming towards human-level AI requires a broader set of key social skills: 1) language use in complex and variable social contexts; 2) beyond language, complex embodied communication in multimodal settings within constantly evolving social worlds. We explain how concepts from cognitive sciences could help AI to draw a roadmap towards human-like intelligence, with a focus on its social dimensions. As a first step, we propose to expand current research to a broader set of core social skills. To do this, we present SocialAI, a benchmark to assess the acquisition of social skills of DRL agents using multiple grid-world environments featuring other (scripted) social agents. We then study the limits of a recent SOTA DRL approach when tested on SocialAI and discuss important next steps towards proficient social agents. Videos and code are available at https://sites.google.com/view/socialai.

【2】 A Primer on Pretrained Multilingual Language Models 标题：预先训练好的多语种语言模型入门

作者：Sumanth Doddapaneni,Gowtham Ramesh,Anoop Kunchukuttan,Pratyush Kumar,Mitesh M. Khapra 机构：Robert Bosch Center for Data Science and Artificial Intelligence, Indian Institute of Technology, Madras, The AI,Bharat Initiative,Microsoft 链接：https://arxiv.org/abs/2107.00676 摘要：多语种语言模型（MLLMs）如mBERT、XLM、XLM-R、\textit{etc.}已经成为一种可行的选择，可以将预训练的力量引入到大量的语言中。鉴于他们在Zero-Shot迁移学习方面的成功，出现了大量的工作：（i）构建涵盖大量语言的大型MLLMs；（ii）创建涵盖更广泛任务和语言的详尽基准，以评估MLLMs；（iii）分析MLLMs在单语学习方面的表现，Zero-Shot跨语言和双语任务（iv）理解MLLMs学习的通用语言模式（如果有的话）和（v）增强MLLMs（通常）有限的能力，以提高其在可见或甚至不可见语言上的表现。在这项调查中，我们回顾了现有的文献，涵盖上述广泛的研究领域有关MLLMs。根据我们的调查，我们提出了一些有前途的研究方向。摘要：Multilingual Language Models (MLLMs) such as mBERT, XLM, XLM-R, \textit{etc.} have emerged as a viable option for bringing the power of pretraining to a large number of languages. Given their success in zero shot transfer learning, there has emerged a large body of work in (i) building bigger MLLMs covering a large number of languages (ii) creating exhaustive benchmarks covering a wider variety of tasks and languages for evaluating MLLMs (iii) analysing the performance of MLLMs on monolingual, zero shot crosslingual and bilingual tasks (iv) understanding the universal language patterns (if any) learnt by MLLMs and (v) augmenting the (often) limited capacity of MLLMs to improve their performance on seen or even unseen languages. In this survey, we review the existing literature covering the above broad areas of research pertaining to MLLMs. Based on our survey, we recommend some promising directions of future research.

其他(2篇)

【1】 Ethics Sheets for AI Tasks 标题：人工智能任务的伦理说明书

作者：Saif M. Mohammad 机构：National Research Council Canada 链接：https://arxiv.org/abs/2107.01183 摘要：一些引人注目的事件，如使用有偏见的累犯系统和对易受伤害的亚群体进行情绪识别系统的大规模测试，突出表明技术往往会给那些已经被边缘化的人带来更不利的结果。在本文中，我将提出一个案例，不仅在单个模型和数据集的层面上，而且在人工智能任务的层面上思考伦理问题。我将提出一种新的工作形式，即人工智能任务的道德规范表，致力于充实隐藏在任务通常框架中的假设和道德考虑，以及我们在数据、方法和评估方面所做的选择。最后，我将提供一个自动情感识别的例子。与数据集的数据表和人工智能系统的模型卡一起，伦理表有助于开发和部署负责任的人工智能系统。摘要：Several high-profile events, such as the use of biased recidivism systems and mass testing of emotion recognition systems on vulnerable sub-populations, have highlighted how technology will often lead to more adverse outcomes for those that are already marginalized. In this paper, I will make a case for thinking about ethical considerations not just at the level of individual models and datasets, but also at the level of AI tasks. I will present a new form of such an effort, Ethics Sheets for AI Tasks, dedicated to fleshing out the assumptions and ethical considerations hidden in how a task is commonly framed and in the choices we make regarding the data, method, and evaluation. Finally, I will provide an example ethics sheet for automatic emotion recognition. Together with Data Sheets for datasets and Model Cards for AI systems, Ethics Sheets aid in the development and deployment of responsible AI systems.

【2】 An Investigation of the (In)effectiveness of Counterfactually Augmented Data 标题：对反事实扩充数据有效性(不)的调查

作者：Nitish Joshi,He He 机构：Department of Computer Science, New York University, Center for Data Science, New York University 链接：https://arxiv.org/abs/2107.00753 摘要：虽然预训练语言模型在自然语言理解基准上取得了优异的性能，但它们往往依赖于虚假的相关性，并且对分布外（OOD）数据的泛化能力较差。最近的工作探索了使用反事实增强数据（CAD）——由最小扰动示例生成的数据来翻转地面真值标签——来识别在分布偏移下不变的稳健特征。然而，使用CAD进行OOD泛化的实证结果却参差不齐。为了解释这种差异，我们从一个线性高斯模型中得出了一些见解，并说明了CAD的缺陷。具体来说，我们表明（a）虽然CAD在识别鲁棒特征方面是有效的，但它可能阻止模型学习未受干扰的鲁棒特征，并且（b）CAD可能加剧数据中存在的虚假相关性。我们的研究结果显示，目前CAD数据集中扰动多样性的缺乏限制了其OOD泛化的有效性，需要创新的众包程序来引出不同的扰动实例。摘要：While pretrained language models achieve excellent performance on natural language understanding benchmarks, they tend to rely on spurious correlations and generalize poorly to out-of-distribution (OOD) data. Recent work has explored using counterfactually-augmented data (CAD) -- data generated by minimally perturbing examples to flip the ground-truth label -- to identify robust features that are invariant under distribution shift. However, empirical results using CAD for OOD generalization have been mixed. To explain this discrepancy, we draw insights from a linear Gaussian model and demonstrate the pitfalls of CAD. Specifically, we show that (a) while CAD is effective at identifying robust features, it may prevent the model from learning unperturbed robust features, and (b) CAD may exacerbate existing spurious correlations in the data. Our results show that the lack of perturbation diversity in current CAD datasets limits its effectiveness on OOD generalization, calling for innovative crowdsourcing procedures to elicit diverse perturbation of examples.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-05，如有侵权请联系 cloudcommunity@tencent.com 删除

linux