自然语言处理学术速递[9.10]

公众号-arXiv每日学术速递

发布于 2021-09-16 17:26:39

1.3K0

发布于 2021-09-16 17:26:39

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.CL 方向，今日共计58篇

Transformer(6篇)

【1】 Vision-and-Language or Vision-for-Language? On Cross-Modal Influence in Multimodal Transformers 标题：视觉与语言，还是视觉换语言？论多模态Transformer中的交叉模态影响链接：https://arxiv.org/abs/2109.04448

作者：Stella Frank,Emanuele Bugliarello,Desmond Elliott 机构：DUniversity of Trento, CUniversity of Copenhagen 备注：EMNLP 2021 摘要：预训练视觉和语言BERTs旨在学习结合两种模式信息的表征。我们提出了一种基于跨模态输入的诊断方法，以评估这些模型实际集成跨模态信息的程度。该方法涉及基于跨模态接地对准，完全或选择性地烧蚀一个模态的输入，并评估另一模态的模型预测性能。模型性能通过反映模型预训练目标的特定于模态的任务来衡量（例如，文本的蒙面语言建模）。当一个模态缺少输入时，已经学会使用这两种模态构造跨模态表示的模型的性能预计会更差。我们发现，最近提出的模型在去除视觉信息时预测文本比在去除文本时预测视觉对象类别相对困难得多，这表明这些模型不是对称的交叉模态模型。摘要：Pretrained vision-and-language BERTs aim to learn representations that combine information from both modalities. We propose a diagnostic method based on cross-modal input ablation to assess the extent to which these models actually integrate cross-modal information. This method involves ablating inputs from one modality, either entirely or selectively based on cross-modal grounding alignments, and evaluating the model prediction performance on the other modality. Model performance is measured by modality-specific tasks that mirror the model pretraining objectives (e.g. masked language modelling for text). Models that have learned to construct cross-modal representations using both modalities are expected to perform worse when inputs are missing from a modality. We find that recently proposed models have much greater relative difficulty predicting text when visual information is ablated, compared to predicting visual object categories when text is ablated, indicating that these models are not symmetrically cross-modal.

【2】 TxT: Crossmodal End-to-End Learning with Transformers 标题：TXT：与Transformer进行跨模式端到端学习链接：https://arxiv.org/abs/2109.04422

作者：Jan-Martin O. Steitz,Jonas Pfeiffer,Iryna Gurevych,Stefan Roth 机构： Department of Computer Science, TU Darmstadt, Germany, hessian.AI, Germany 备注：To appear at the 43rd DAGM German Conference on Pattern Recognition (GCPR) 2021 摘要：通过多种方式进行推理，例如在视觉问答（VQA）中，需要跨领域对齐语义概念。尽管端到端学习取得了广泛的成功，但今天的多模式管道基本上利用了从对象检测器预提取的固定特征，通常更快的R-CNN，作为视觉世界的表示。明显的缺点是，视觉表现并没有专门针对手头的多模态任务进行调整。与此同时，虽然基于Transformer的对象检测器已经得到普及，但它们还没有被用于今天的多模管道中。我们使用TxT解决了这两个缺点，TxT是一种基于转换器的跨模式管道，能够以完全端到端的方式对下游任务的语言和视觉组件进行微调。我们克服了基于转换器的检测器在全局上下文集成和可伸缩性方面的多模态推理的现有局限性。我们基于Transformer的多模态模型从多模态问答的端到端学习中获得了可观的收益。摘要：Reasoning over multiple modalities, e.g. in Visual Question Answering (VQA), requires an alignment of semantic concepts across domains. Despite the widespread success of end-to-end learning, today's multimodal pipelines by and large leverage pre-extracted, fixed features from object detectors, typically Faster R-CNN, as representations of the visual world. The obvious downside is that the visual representation is not specifically tuned to the multimodal task at hand. At the same time, while transformer-based object detectors have gained popularity, they have not been employed in today's multimodal pipelines. We address both shortcomings with TxT, a transformer-based crossmodal pipeline that enables fine-tuning both language and visual components on the downstream task in a fully end-to-end manner. We overcome existing limitations of transformer-based detectors for multimodal reasoning regarding the integration of global context and their scalability. Our transformer-based multimodal model achieves considerable gains from end-to-end learning for multimodal question answering.

【3】 All Bark and No Bite: Rogue Dimensions in Transformer Language Models Obscure Representational Quality 标题：全叫不咬：Transformer语言模型中的流氓维度模糊了表征质量链接：https://arxiv.org/abs/2109.04404

作者：William Timkey,Marten van Schijndel 机构：Department of Linguistics, Cornell University 备注：Accepted at EMNLP 2021 摘要：相似性度量是理解语言模型如何表示和处理语言的重要工具。标准的表征相似性度量（如余弦相似性和欧氏距离）已成功地应用于静态单词嵌入模型中，以了解单词在语义空间中如何聚类。最近，这些措施已被应用于背景化模型（如BERT和GPT-2）的嵌入。在这项工作中，我们对语境化语言模型的这些度量的信息性提出了质疑。我们发现，少数流氓维度（通常只有1-3）主导了这些度量。此外，我们发现主导相似性度量的维度与对模型行为重要的维度之间存在显著的不匹配。我们表明，简单的后处理技术，如标准化，能够纠正流氓尺寸和揭示潜在的代表性质量。我们认为，对于上下文语言模型的任何基于相似性的分析来说，考虑恶意维度是必不可少的。摘要：Similarity measures are a vital tool for understanding how language models represent and process language. Standard representational similarity measures such as cosine similarity and Euclidean distance have been successfully used in static word embedding models to understand how words cluster in semantic space. Recently, these measures have been applied to embeddings from contextualized models such as BERT and GPT-2. In this work, we call into question the informativity of such measures for contextualized language models. We find that a small number of rogue dimensions, often just 1-3, dominate these measures. Moreover, we find a striking mismatch between the dimensions that dominate similarity measures and those which are important to the behavior of the model. We show that simple postprocessing techniques such as standardization are able to correct for rogue dimensions and reveal underlying representational quality. We argue that accounting for rogue dimensions is essential for any similarity-based analysis of contextual language models.

【4】 MATE: Multi-view Attention for Table Transformer Efficiency 标题：Mate：台式Transformer效率的多视角关注链接：https://arxiv.org/abs/2109.04312

作者：Julian Martin Eisenschlos,Maharshi Gor,Thomas Müller,William W. Cohen 机构：Google Research, Dept. of Computer Science, University of Maryland, Symanto Research, Valencia, Spain 备注：Accepted to EMNLP 2021 摘要：这项工作提出了一种稀疏注意转换器体系结构，用于对包含大型表的文档进行建模。表格在网络上无处不在，而且信息丰富。然而，web上超过20%的关系表具有20行或更多行（Cafarella等人，2008），这些大型表对电流互感器模型提出了挑战，电流互感器模型通常限于512个令牌。在这里，我们提出MATE，一种新的转换器体系结构，用于对web表的结构进行建模。MATE使用稀疏注意力的方式，使头部能够有效地关注表中的行或列。该体系结构在速度和内存方面呈线性扩展，可以使用当前加速器处理包含8000多个令牌的文档。MATE还对表格数据具有更合适的归纳偏差，并为三个表格推理数据集设置了新的最新技术。对于HybridQA（Chen等人，2020b），一个涉及包含表格的大型文档的数据集，我们将最佳先验结果提高了19个点。摘要：This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However, more than 20% of relational tables on the web have 20 or more rows (Cafarella et al., 2008), and these large tables present a challenge for current Transformer models, which are typically limited to 512 tokens. Here we propose MATE, a novel Transformer architecture designed to model the structure of web tables. MATE uses sparse attention in a way that allows heads to efficiently attend to either rows or columns in a table. This architecture scales linearly with respect to speed and memory, and can handle documents containing more than 8000 tokens with current accelerators. MATE also has a more appropriate inductive bias for tabular data, and sets a new state-of-the-art for three table reasoning datasets. For HybridQA (Chen et al., 2020b), a dataset that involves large documents containing tables, we improve the best prior result by 19 points.

【5】 What's Hidden in a One-layer Randomly Weighted Transformer? 标题：单层随机加权Transformer里隐藏着什么？链接：https://arxiv.org/abs/2109.03939

作者：Sheng Shen,Zhewei Yao,Douwe Kiela,Kurt Keutzer,Michael W. Mahoney 机构：†UC Berkeley; ‡Facebook AI Research 备注：EMNLP 2021 (short) 摘要：我们证明，隐藏在一层随机加权神经网络中的子网络可以在机器翻译任务中实现令人印象深刻的性能，而无需修改权重初始化。为了寻找单层随机加权神经网络的子网络，我们对同一权重矩阵应用不同的二元掩码来生成不同的层。隐藏在一层随机加权Transformer中，我们发现在IWSLT14/WMT14上可以实现29.45/17.29 BLEU的子网络。使用固定的预训练嵌入层，先前发现的子网络比经过训练的基于IWSLT14/WMT14的小型Transformer的性能小，但可以达到98%/92%（34.14/25.24 BLEU）。此外，我们还演示了在此设置中更大和更深Transformer的有效性，以及不同初始化方法的影响。我们在上发布了源代码https://github.com/sIncerass/one_layer_lottery_ticket. 摘要：We demonstrate that, hidden within one-layer randomly weighted neural networks, there exist subnetworks that can achieve impressive performance, without ever modifying the weight initializations, on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden within a one-layer randomly weighted Transformer, we find that subnetworks that can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98%/92% (34.14/25.24 BLEU) of the performance of, a trained Transformer small/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger and deeper transformers in this setting, as well as the impact of different initialization methods. We released the source code at https://github.com/sIncerass/one_layer_lottery_ticket.

【6】 Transformers in the loop: Polarity in neural models of language 标题：环路中的Transformer：语言神经模型中的极性链接：https://arxiv.org/abs/2109.03926

作者：Lisa Bylinina,Alexey Tikhonov 机构：Bookarang, Yandex 摘要：计算语言模型中语言现象的表示通常是根据现有语言理论对这些现象的预测进行评估的。将极性概念作为案例研究，我们表明，这并不总是最适当的设置。我们在两个预先训练过的Transformer模型（BERT和GPT-2）中通过所谓的“负极性项”（特别是英语“any”）来探测极性。我们发现——至少对于极性而言——从语言模型中得出的指标比语言理论的预测更符合心理语言学实验的数据。通过建立这一模型，我们可以更充分地评估语言模型的性能，也可以使用语言模型发现超越现有语言理论的自然语言语法的新见解。总的来说，我们的结果鼓励在人类实验和语言模型之间建立更紧密的联系。我们提出了实现这种紧密联系的方法，将语言模型作为实验管道的一部分，并展示了这种管道的工作情况。摘要：Representation of linguistic phenomena in computational language models is typically assessed against the predictions of existing linguistic theories of these phenomena. Using the notion of polarity as a case study, we show that this is not always the most adequate set-up. We probe polarity via so-called 'negative polarity items' (in particular, English 'any') in two pre-trained Transformer-based models (BERT and GPT-2). We show that -- at least for polarity -- metrics derived from language models are more consistent with data from psycholinguistic experiments than linguistic theory predictions. Establishing this allows us to more adequately evaluate the performance of language models and also to use language models to discover new insights into natural language grammar beyond existing linguistic theories. Overall, our results encourage a closer tie between experiments with human subjects and with language models. We propose methods to enable this closer tie, with language models as part of experimental pipeline, and show this pipeline at work.

QA|VQA|问答|对话(2篇)

【1】 Thinking Clearly, Talking Fast: Concept-Guided Non-Autoregressive Generation for Open-Domain Dialogue Systems 标题：思路清晰，语速快：概念引导的开放领域对话系统非自回归生成链接：https://arxiv.org/abs/2109.04084

作者：Yicheng Zou,Zhihua Liu,Xingwu Hu,Qi Zhang 机构：Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, School of Computer Science, Fudan University, Shanghai, China 备注：Accepted by EMNLP 2021, 12 pages 摘要：人类对话包含不断演变的概念，说话者自然会将多个概念联系起来，形成一个回应。然而，目前使用seq2seq框架的对话模型缺乏有效管理概念转换的能力，并且很难以顺序解码的方式将多个概念引入到响应中。为了促进可控和连贯的对话，在这项工作中，我们设计了一个概念引导的非自回归模型（CG nAR）来生成开放域对话。提出的模型包括一个多概念规划模块，该模块学习从概念图中识别多个相关概念，以及一个定制的插入转换器，该转换器执行概念引导的非自回归生成以完成响应。在两个公共数据集上的实验结果表明，CG nAR可以产生多样且一致的响应，在自动和人工评估方面都优于最先进的基线，推理速度大大加快。摘要：Human dialogue contains evolving concepts, and speakers naturally associate multiple concepts to compose a response. However, current dialogue models with the seq2seq framework lack the ability to effectively manage concept transitions and can hardly introduce multiple concepts to responses in a sequential decoding manner. To facilitate a controllable and coherent dialogue, in this work, we devise a concept-guided non-autoregressive model (CG-nAR) for open-domain dialogue generation. The proposed model comprises a multi-concept planning module that learns to identify multiple associated concepts from a concept graph and a customized Insertion Transformer that performs concept-guided non-autoregressive generation to complete a response. The experimental results on two public datasets show that CG-nAR can produce diverse and coherent responses, outperforming state-of-the-art baselines in both automatic and human evaluations with substantially faster inference speed.

【2】 Weakly-Supervised Visual-Retriever-Reader for Knowledge-based Question Answering 标题：基于知识答疑的弱监督视觉检索阅读器链接：https://arxiv.org/abs/2109.04014

作者：Man Luo,Yankai Zeng,Pratyay Banerjee,Chitta Baral 机构：Arizona State University 备注：accepted at EMNLP 2021 摘要：基于知识的视觉问答（VQA）要求在回答问题时，除了图像内容外，还需要使用外部知识。一个主要用于评估基于知识的VQA的数据集是OK-VQA，但它缺乏用于检索的金标准知识语料库。现有工作利用不同的知识库（如概念网和维基百科）获取外部知识。由于知识库的不同，很难公平地比较模型的性能。为了解决这个问题，我们收集了一个可用于任何VQA系统的自然语言知识库。此外，我们还提出了一个可视化检索器-读取器管道来实现基于知识的VQA。视觉检索器的目的是检索相关知识，而视觉读者则试图根据给定的知识预测答案。我们介绍了使用文本和图像检索知识的各种方法以及两种阅读器样式：分类和提取。检索者和阅读者都是在缺乏监督的情况下接受训练的。我们的实验结果表明，一个好的检索器可以显著提高阅读器在OK-VQA挑战中的性能。代码和语料库在中提供https://github.com/luomancs/retriever\_reader\\\\\\\\\\\\\\\\\\\\\\\\\ okvqa.git 摘要：Knowledge-based visual question answering (VQA) requires answering questions with external knowledge in addition to the content of images. One dataset that is mostly used in evaluating knowledge-based VQA is OK-VQA, but it lacks a gold standard knowledge corpus for retrieval. Existing work leverage different knowledge bases (e.g., ConceptNet and Wikipedia) to obtain external knowledge. Because of varying knowledge bases, it is hard to fairly compare models' performance. To address this issue, we collect a natural language knowledge base that can be used for any VQA system. Moreover, we propose a Visual Retriever-Reader pipeline to approach knowledge-based VQA. The visual retriever aims to retrieve relevant knowledge, and the visual reader seeks to predict answers based on given knowledge. We introduce various ways to retrieve knowledge using text and images and two reader styles: classification and extraction. Both the retriever and reader are trained with weak supervision. Our experimental results show that a good retriever can significantly improve the reader's performance on the OK-VQA challenge. The code and corpus are provided in https://github.com/luomancs/retriever\_reader\_for\_okvqa.git

机器翻译(7篇)

【1】 HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints 标题：HintdBT：使用质量和音译提示增强回译链接：https://arxiv.org/abs/2109.04443

作者：Sahana Ramnath,Melvin Johnson,Abhirut Gupta,Aravindan Raghuveer 机构：Google Research 备注：17 pages including references and appendix. Accepted at EMNLP 2021 摘要：目标单语语料库的反向翻译（BT）是一种广泛应用于神经机器翻译（NMT）的数据增强策略，尤其是对于低资源语言对。为了提高可用BT数据的有效性，我们引入了HintedBT——一系列向编码器和解码器提供提示（通过标签）的技术。首先，我们提出了一种同时使用高质量和低质量BT数据的新方法，通过向模型提供关于每个源-目标对质量的提示（如编码器上的源标记）。我们没有过滤掉低质量的数据，而是表明这些提示使模型能够有效地从噪声数据中学习。其次，我们解决了预测源标记是否需要翻译或音译为目标语言的问题，这在跨脚本翻译任务中很常见（即，源和目标不共享书面脚本）。对于这种情况，我们建议使用额外的提示（如解码器上的目标标记）来训练模型，这些提示提供有关源上所需操作（翻译或翻译和音译）的信息。我们对标准WMT基准测试进行了实验和详细分析，测试对象为三对跨脚本低/中资源语言：{印地语、古吉拉特语、泰米尔语}到英语。我们的方法与五个强大且完善的基线进行了比较。我们发现，使用这些提示，无论是单独使用还是结合使用，都能显著提高翻译质量，并在相应的双语环境中，在所有三种语言对中取得最先进的表现。摘要：Back-translation (BT) of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT), especially for low-resource language pairs. To improve effectiveness of the available BT data, we introduce HintedBT -- a family of techniques which provides hints (through tags) to the encoder and decoder. First, we propose a novel method of using both high and low quality BT data by providing hints (as source tags on the encoder) to the model about the quality of each source-target pair. We don't filter out low quality data but instead show that these hints enable the model to learn effectively from noisy data. Second, we address the problem of predicting whether a source token needs to be translated or transliterated to the target language, which is common in cross-script translation tasks (i.e., where source and target do not share the written script). For such cases, we propose training the model with additional hints (as target tags on the decoder) that provide information about the operation required on the source (translation or both translation and transliteration). We conduct experiments and detailed analyses on standard WMT benchmarks for three cross-script low/medium-resource language pairs: {Hindi,Gujarati,Tamil}-to-English. Our methods compare favorably with five strong and well established baselines. We show that using these hints, both separately and together, significantly improves translation quality and leads to state-of-the-art performance in all three language pairs in corresponding bilingual settings.

【2】 Generalised Unsupervised Domain Adaptation of Neural Machine Translation with Cross-Lingual Data Selection 标题：具有跨语言数据选择的神经机器翻译的广义无监督领域自适应链接：https://arxiv.org/abs/2109.04292

作者：Thuy-Trang Vu,Xuanli He,Dinh Phung,Gholamreza Haffari 机构：Department of Data Science and AI, Monash University, Australia 备注：EMNLP2021 摘要：本文考虑神经网络机器翻译（NMT）中的无监督域自适应问题，我们假设在新域中仅访问源语或目标语中的单语文本。我们提出了一种跨语言数据选择方法，从大型通用单语语料库中提取缺失语言侧的领域内句子。我们提出的方法通过对比学习在多语言BERT的基础上训练一个自适应层，使源语言和目标语言之间的表示保持一致。然后，这使得域分类器能够以零触发方式在语言之间进行转移。一旦分类器检测到域内数据，NMT模型通过联合学习翻译和域识别任务来适应新域。我们评估了我们的跨语言NMT数据选择方法，该方法跨越三种语言对中的五个不同领域，以及对新冠病毒-19翻译的真实场景。结果表明，在BLEU分数为+1.5的情况下，我们提出的方法优于其他选择基线。摘要：This paper considers the unsupervised domain adaptation problem for neural machine translation (NMT), where we assume the access to only monolingual text in either the source or target language in the new domain. We propose a cross-lingual data selection method to extract in-domain sentences in the missing language side from a large generic monolingual corpus. Our proposed method trains an adaptive layer on top of multilingual BERT by contrastive learning to align the representation between the source and target language. This then enables the transferability of the domain classifier between the languages in a zero-shot manner. Once the in-domain data is detected by the classifier, the NMT model is then adapted to the new domain by jointly learning translation and domain discrimination tasks. We evaluate our cross-lingual data selection method on NMT across five diverse domains in three language pairs, as well as a real-world scenario of translation for COVID-19. The results show that our proposed method outperforms other selection baselines up to +1.5 BLEU score.

【3】 Distributionally Robust Multilingual Machine Translation 标题：分布式健壮的多语言机器翻译链接：https://arxiv.org/abs/2109.04020

作者：Chunting Zhou,Daniel Levy,Xian Li,Marjan Ghazvininejad,Graham Neubig 机构：Language Technologies Institute, Carnegie Mellon University, Stanford University, Facebook AI 备注：Long paper accepted by EMNLP2021 main conference 摘要：多语言神经机器翻译（MNMT）学习用一个模型翻译多个语言对，潜在地提高了部署模型的准确性和存储器效率。然而，语言之间严重的数据不平衡阻碍了模型在语言对之间的统一执行。在本文中，我们提出了一个新的基于分布鲁棒优化的MNMT学习目标，该目标最小化语言对集合上最坏情况下的期望损失。我们进一步展示了如何使用迭代最佳响应方案对大型翻译语料库的这一目标进行实际优化，与标准经验风险最小化相比，该方案既有效又产生可忽略的额外计算成本。我们在两个数据集中的三组语言上进行了广泛的实验，结果表明，在多对一和一对多翻译设置下，我们的方法在平均和每种语言的性能方面始终优于强基线方法。摘要：Multilingual neural machine translation (MNMT) learns to translate multiple language pairs with a single model, potentially improving both the accuracy and the memory-efficiency of deployed models. However, the heavy data imbalance between languages hinders the model from performing uniformly across language pairs. In this paper, we propose a new learning objective for MNMT based on distributionally robust optimization, which minimizes the worst-case expected loss over the set of language pairs. We further show how to practically optimize this objective for large translation corpora using an iterated best response scheme, which is both effective and incurs negligible additional computational cost compared to standard empirical risk minimization. We perform extensive experiments on three sets of languages from two datasets and show that our method consistently outperforms strong baseline methods in terms of average and per-language performance under both many-to-one and one-to-many translation settings.

【4】 Competence-based Curriculum Learning for Multilingual Machine Translation 标题：基于能力的多语言机器翻译课程学习链接：https://arxiv.org/abs/2109.04002

作者：Mingliang Zhang,Fandong Meng,Yunhai Tong,Jie Zhou 机构：Key Laboratory of Machine Perception, School of EECS, Peking University, Pattern Recognition Center, WeChat AI, Tencent Inc, China 备注：Accepted by Findings of EMNLP 2021. We release the codes at this https URL 摘要：目前，多语言机器翻译因其为低资源语言（LRL）带来更好的性能和节省更多的空间而受到越来越多的关注。然而，现有的多语言机器翻译模型面临着一个严峻的挑战：不平衡。因此，在多语言翻译模型中，不同语言的翻译性能差异很大。我们认为，这种不平衡问题源于不同语言的不同学习能力。因此，我们专注于平衡不同语言的学习能力，并提出了基于能力的多语言机器翻译课程学习（CCL-M）。具体而言，我们首先定义了两种能力，以帮助安排高资源语言（HRL）和低资源语言：1）自评能力，评估语言本身的学习情况；2）HRLs评估能力，根据HRLs的自我评估能力评估LRL是否准备好学习。基于上述能力，我们利用所提出的CCL-M算法，以课程学习的方式逐步向训练集中添加新的语言。此外，我们还提出了一种新的能力感知动态平衡抽样策略，以更好地选择多语言训练中的训练样本。实验结果表明，与以前在TED talks数据集上的最新方法相比，我们的方法实现了稳定而显著的性能提升。摘要：Currently, multilingual machine translation is receiving more and more attention since it brings better performance for low resource languages (LRLs) and saves more space. However, existing multilingual machine translation models face a severe challenge: imbalance. As a result, the translation performance of different languages in multilingual translation models are quite different. We argue that this imbalance problem stems from the different learning competencies of different languages. Therefore, we focus on balancing the learning competencies of different languages and propose Competence-based Curriculum Learning for Multilingual Machine Translation, named CCL-M. Specifically, we firstly define two competencies to help schedule the high resource languages (HRLs) and the low resource languages: 1) Self-evaluated Competence, evaluating how well the language itself has been learned; and 2) HRLs-evaluated Competence, evaluating whether an LRL is ready to be learned according to HRLs' Self-evaluated Competence. Based on the above competencies, we utilize the proposed CCL-M algorithm to gradually add new languages into the training set in a curriculum learning manner. Furthermore, we propose a novel competenceaware dynamic balancing sampling strategy for better selecting training samples in multilingual training. Experimental results show that our approach has achieved a steady and significant performance gain compared to the previous state-of-the-art approach on the TED talks dataset.

【5】 Ensemble Fine-tuned mBERT for Translation Quality Estimation 标题：用于翻译质量评估的集成微调mBERT 链接：https://arxiv.org/abs/2109.03914

作者：Shaika Chowdhury,Naouel Baili,Brian Vannah 机构：University of Illinois at Chicago, US, IQVIA, US 备注：The Sixth Conference on Machine Translation, WMT 2021 摘要：质量评估（QE）是机器翻译工作流程的一个重要组成部分，因为它在不咨询参考译文的情况下评估翻译输出的质量。在本文中，我们讨论了我们提交给WMT 2021 QE共享任务的情况。我们参与任务2的句子级子任务，挑战参与者预测句子级后期编辑工作的HTER分数。我们提出的系统是基于多语言BERT（mBERT）的回归模型的集合，这些模型通过微调不同的输入设置生成。它在皮尔逊相关性方面表现出了可比性，并在多个语言对的MAE/RMSE方面优于基线系统。此外，我们通过利用目标语言相关的语言对和伪参考翻译，使我们的系统适应Zero-Shot设置。摘要：Quality Estimation (QE) is an important component of the machine translation workflow as it assesses the quality of the translated output without consulting reference translations. In this paper, we discuss our submission to the WMT 2021 QE Shared Task. We participate in Task 2 sentence-level sub-task that challenge participants to predict the HTER score for sentence-level post-editing effort. Our proposed system is an ensemble of multilingual BERT (mBERT)-based regression models, which are generated by fine-tuning on different input settings. It demonstrates comparable performance with respect to the Pearson's correlation and beats the baseline system in MAE/ RMSE for several language pairs. In addition, we adapt our system for the zero-shot setting by exploiting target language-relevant language pairs and pseudo-reference translations.

【6】 Collecting a Large-Scale Gender Bias Dataset for Coreference Resolution and Machine Translation 标题：收集大规模性别偏见数据集用于指代消解和机器翻译链接：https://arxiv.org/abs/2109.03858

作者：Shahar Levy,Koren Lazar,abriel Stanovsky 机构：School of Computer Science and Engineering, The Hebrew University of Jerusalem, Jerusalem, Israel 备注：Accepted to Findings of EMNLP 2021 摘要：最近的研究发现，在机器翻译和共指消解模型中存在性别偏见，这些模型主要使用合成诊断数据集。虽然这些在受控实验中量化了偏差，但它们通常在小范围内进行量化，并且主要由人工的、分布不均的句子组成。在这项工作中，我们在三个领域的语料库中发现了表明刻板和非刻板性别角色分配的语法模式（例如，女性护士与男性舞者），从而形成了第一个大规模的性别偏见数据集，包含108K个不同的真实英语句子。我们手动验证语料库的质量，并使用它来评估各种共指消解和机器翻译模型中的性别偏见。我们发现，所有经过测试的模型在呈现自然输入时往往过度依赖性别刻板印象，这在商业系统中部署时可能特别有害。最后，我们展示了我们的数据集有助于微调共指消解模型，发现它减轻了对保留集的偏见。我们的数据集和模型可在www.github.com/SLAB-NLP/BUG上公开获取。我们希望他们能推动未来在现实环境中对性别偏见评估缓解技术的研究。摘要：Recent works have found evidence of gender bias in models of machine translation and coreference resolution using mostly synthetic diagnostic datasets. While these quantify bias in a controlled experiment, they often do so on a small scale and consist mostly of artificial, out-of-distribution sentences. In this work, we find grammatical patterns indicating stereotypical and non-stereotypical gender-role assignments (e.g., female nurses versus male dancers) in corpora from three domains, resulting in a first large-scale gender bias dataset of 108K diverse real-world English sentences. We manually verify the quality of our corpus and use it to evaluate gender bias in various coreference resolution and machine translation models. We find that all tested models tend to over-rely on gender stereotypes when presented with natural inputs, which may be especially harmful when deployed in commercial systems. Finally, we show that our dataset lends itself to finetuning a coreference resolution model, finding it mitigates bias on a held out set. Our dataset and models are publicly available at www.github.com/SLAB-NLP/BUG. We hope they will spur future research into gender bias evaluation mitigation techniques in realistic settings.

【7】 Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring 标题：采用并行自回归研究的非自回归端到端语音翻译链接：https://arxiv.org/abs/2109.04411

作者：Hirofumi Inaguma,Yosuke Higuchi,Kevin Duh,Tatsuya Kawahara,Shinji Watanabe 机构： Higuchi is with Waseda University 摘要：本文描述了一种基于非自回归（NAR）模型的端到端语音翻译（E2E-ST）框架。与传统的级联系统相比，端到端语音翻译模型具有一些优势，如减少推理延迟。然而，传统的AR解码方法不够快，因为每个令牌都是增量生成的。然而，NAR模型可以通过基于令牌条件独立性假设并行生成多个令牌来加快解码速度。我们提出了一个称为Orthros的统一NAR E2E-ST框架，该框架在共享编码器的基础上有一个NAR解码器和一个辅助浅AR解码器。辅助浅AR解码器通过并行地重新扫描从NAR解码器生成的多个候选（并行AR重新扫描）来选择最佳假设。我们采用条件掩蔽语言模型（CMLM）和基于连接主义时间分类（CTC）的模型作为Orthros的NAR解码器，分别称为Orthros CMLM和Orthros CTC。我们还提出了两种训练方法来增强CMLM解码器。在三个具有六种语言方向的基准数据集上进行的实验评估表明，与基线NAR模型相比，Orthros在翻译质量方面取得了很大的改进，但开销非常小。此外，Conformer编码器体系结构实现了大量质量改进，特别是对于基于CTC的模型。采用Conformer编码器的Orthros CTC在CPU上的解码速度提高了3.63倍，翻译质量与AR模型相当。摘要：This article describes an efficient end-to-end speech translation (E2E-ST) framework based on non-autoregressive (NAR) models. End-to-end speech translation models have several advantages over traditional cascade systems such as inference latency reduction. However, conventional AR decoding methods are not fast enough because each token is generated incrementally. NAR models, however, can accelerate the decoding speed by generating multiple tokens in parallel on the basis of the token-wise conditional independence assumption. We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder. The auxiliary shallow AR decoder selects the best hypothesis by rescoring multiple candidates generated from the NAR decoder in parallel (parallel AR rescoring). We adopt conditional masked language model (CMLM) and a connectionist temporal classification (CTC)-based model as NAR decoders for Orthros, referred to as Orthros-CMLM and Orthros-CTC, respectively. We also propose two training methods to enhance the CMLM decoder. Experimental evaluations on three benchmark datasets with six language directions demonstrated that Orthros achieved large improvements in translation quality with a very small overhead compared with the baseline NAR model. Moreover, the Conformer encoder architecture enabled large quality improvements, especially for CTC-based models. Orthros-CTC with the Conformer encoder increased decoding speed by 3.63x on CPU with translation quality comparable to that of an AR model.

语义分析(3篇)

【1】 Translate & Fill: Improving Zero-Shot Multilingual Semantic Parsing with Synthetic Data 标题：翻译与填充：利用合成数据改进零射多语言语义分析链接：https://arxiv.org/abs/2109.04319

作者：Massimo Nicosia,Zhongdi Qu,Yasemin Altun 机构：Google Research 备注：Accepted to EMNLP 2021 (Findings) 摘要：虽然在单一语言上进行微调的多语言预训练语言模型（LMs）显示出了强大的跨语言任务转移能力，但在目标语言监控可用的情况下，语义分析任务仍存在很大的性能差距。在本文中，我们提出了一种新的翻译和填充（TaF）方法来为多语言语义分析器生成白银训练数据。该方法简化了流行的Translate-Align项目（TAP）管道，并由一个序列到序列填充模型组成，该模型构建了一个以话语和相同语法视图为条件的完整语法分析。我们的填充者只接受英语数据的训练，但能够以零拍方式准确地完成其他语言的实例（即英语训练话语的翻译）。在三个多语种语义分析数据集上的实验结果表明，使用TaF进行的数据扩充达到了与依赖传统对齐技术的类似系统相比的精度。摘要：While multilingual pretrained language models (LMs) fine-tuned on a single language have shown substantial cross-lingual task transfer capabilities, there is still a wide performance gap in semantic parsing tasks when target language supervision is available. In this paper, we propose a novel Translate-and-Fill (TaF) method to produce silver training data for a multilingual semantic parser. This method simplifies the popular Translate-Align-Project (TAP) pipeline and consists of a sequence-to-sequence filler model that constructs a full parse conditioned on an utterance and a view of the same parse. Our filler is trained on English data only but can accurately complete instances in other languages (i.e., translations of the English training utterances), in a zero-shot fashion. Experimental results on three multilingual semantic parsing datasets show that data augmentation with TaF reaches accuracies competitive with similar systems which rely on traditional alignment techniques.

【2】 Lexico-semantic and affective modelling of Spanish poetry: A semi-supervised learning approach 标题：西班牙诗歌的词汇语义和情感建模：一种半监督学习方法链接：https://arxiv.org/abs/2109.04152

作者：Alberto Barbado,María Dolores González,Débora Carrera 机构：Universidad Polit´ecnica de Madrid, Departamento de Inteligencia Artificial, Telef´onica, Madrid, Spain, IFEMA, Madrid, Spain., Universidad Oberta De Catalunya, Barcelona, Spain 备注：24 pages, 8 figures, 7 tables 摘要：在过去几年中，由于使用了转换器，文本分类任务有了很大的改进。然而，大多数研究集中在散文文本上，诗歌受到的关注较少，尤其是西班牙语。在本文中，我们提出了一种半监督学习方法来推断由4572首十四行诗以及10首情感和词汇语义多类十四行诗组成的21个心理类别。用于训练评估的诗歌子集包括270首十四行诗。通过我们的方法，76%的心理类别的AUC超过0.7，多类别类别的AUC超过0.65。十四行诗是用Transformer，通过句子嵌入，以及通过使用外部词汇获得的词汇语义和情感特征来建模的。因此，我们看到，与单独使用Transformer相比，这种方法提供的AUC增加高达0.12。摘要：Text classification tasks have improved substantially during the last years by the usage of transformers. However, the majority of researches focus on prose texts, with poetry receiving less attention, specially for Spanish language. In this paper, we propose a semi-supervised learning approach for inferring 21 psychological categories evoked by a corpus of 4572 sonnets, along with 10 affective and lexico-semantic multiclass ones. The subset of poems used for training an evaluation includes 270 sonnets. With our approach, we achieve an AUC beyond 0.7 for 76% of the psychological categories, and an AUC over 0.65 for 60% on the multiclass ones. The sonnets are modelled using transformers, through sentence embeddings, along with lexico-semantic and affective features, obtained by using external lexicons. Consequently, we see that this approach provides an AUC increase of up to 0.12, as opposed to using transformers alone.

【3】 MapRE: An Effective Semantic Mapping Approach for Low-resource Relation Extraction 标题：MapRE：一种有效的低资源关系抽取语义映射方法链接：https://arxiv.org/abs/2109.04108

作者：Manqing Dong,Chunguang Pan,Zhipeng Luo 机构：DeepBlue Technology (Shanghai) Co., Ltd 备注：Accepted as a long paper in the main conference of EMNLP 2021 摘要：近年来，神经关系提取模型已显示出良好的效果；然而，仅在少量训练样本的情况下，模型性能会急剧下降。最近的工作试图利用少数镜头学习的进展来解决资源不足的问题，他们训练标签不可知模型来直接比较嵌入空间中上下文句子之间的语义相似性。然而，标签感知信息，即包含关系本身语义知识的关系标签，在预测中往往被忽略。在这项工作中，我们提出了一个同时考虑标签不可知和标签感知语义映射信息的低资源关系提取框架。我们表明，在预训练和微调中结合上述两种类型的映射信息可以显著提高低资源关系提取任务的模型性能。摘要：Neural relation extraction models have shown promising results in recent years; however, the model performance drops dramatically given only a few training samples. Recent works try leveraging the advance in few-shot learning to solve the low resource problem, where they train label-agnostic models to directly compare the semantic similarities among context sentences in the embedding space. However, the label-aware information, i.e., the relation label that contains the semantic knowledge of the relation itself, is often neglected for prediction. In this work, we propose a framework considering both label-agnostic and label-aware semantic mapping information for low resource relation extraction. We show that incorporating the above two types of mapping information in both pretraining and fine-tuning can significantly improve the model performance on low-resource relation extraction tasks.

Graph|知识图谱|Knowledge(9篇)

【1】 Cross-lingual Transfer for Text Classification with Dictionary-based Heterogeneous Graph 标题：基于字典异构图的文本分类跨语言迁移链接：https://arxiv.org/abs/2109.04400

作者：Nuttapong Chairatanakul,Noppayut Sriwatanasakdi,Nontawat Charoenphakdee,Xin Liu,Tsuyoshi Murata 机构：Tokyo Institute of Technology,RWBC-OIL, AIST,Asurion Japan Holdings G.K., The University of Tokyo,RIKEN AIP,AIRC, AIST 备注：Published in Findings of EMNLP 2021 摘要：在跨语言文本分类中，要求高资源源语言中的任务特定训练数据可用，其中任务与低资源目标语言的任务相同。然而，由于标签成本、任务特征和隐私问题，收集此类训练数据可能是不可行的。本文提出了一种仅使用高资源语言和双语词典的任务无关单词嵌入的替代解决方案。首先，我们从双语词典中构造了一个基于词典的异构图（DHG）。这为使用图形神经网络进行跨语言迁移提供了可能性。剩下的挑战是DHG的异构性，因为考虑了多种语言。为了应对这一挑战，我们提出了基于词典的异构图神经网络（DHGNet），该网络通过两步聚合（词级聚合和语言级聚合）有效地处理DHG的异构性。实验结果表明，尽管我们的方法不能访问大型语料库，但其性能优于预训练模型。此外，即使字典中包含许多不正确的翻译，它也可以很好地执行。它的健壮性允许使用范围更广的词典，例如自动构建的词典和众包词典，这对于现实世界的应用非常方便。摘要：In cross-lingual text classification, it is required that task-specific training data in high-resource source languages are available, where the task is identical to that of a low-resource target language. However, collecting such training data can be infeasible because of the labeling cost, task characteristics, and privacy concerns. This paper proposes an alternative solution that uses only task-independent word embeddings of high-resource languages and bilingual dictionaries. First, we construct a dictionary-based heterogeneous graph (DHG) from bilingual dictionaries. This opens the possibility to use graph neural networks for cross-lingual transfer. The remaining challenge is the heterogeneity of DHG because multiple languages are considered. To address this challenge, we propose dictionary-based heterogeneous graph neural network (DHGNet) that effectively handles the heterogeneity of DHG by two-step aggregations, which are word-level and language-level aggregations. Experimental results demonstrate that our method outperforms pretrained models even though it does not access to large corpora. Furthermore, it can perform well even though dictionaries contain many incorrect translations. Its robustness allows the usage of a wider range of dictionaries such as an automatically constructed dictionary and crowdsourced dictionary, which are convenient for real-world applications.

【2】 KELM: Knowledge Enhanced Pre-Trained Language Representations with Message Passing on Hierarchical Relational Graphs 标题：KELM：基于层次关系图消息传递的知识增强型预训练语言表示链接：https://arxiv.org/abs/2109.04223

作者：Yinquan Lu,Haonan Lu,Guirong Fu,Qun Liu 机构： Huawei Technologies Co., Ltd., OPPO Guangdong Mobile Telecommunications Co., Ltd., ByteDance, Huawei Noah’s Ark Lab 摘要：将事实知识纳入预训练语言模型（PLM），如BERT，是最近NLP研究的一个新兴趋势。然而，现有的方法大多将外部知识集成模块与修改的预训练损失相结合，在大规模语料库上重新实现预训练过程。重新预训练这些模型通常会消耗资源，并且难以适应具有不同知识图（KG）的另一个领域。此外，这些作品要么无法根据文本语境动态嵌入知识语境，要么难以解决知识歧义问题。在本文中，我们提出了一种新的基于微调过程的知识感知语言模型框架，该框架为PLM提供了一个统一的知识增强文本图，其中包含从KG中提取的文本和多关系子图。我们设计了一种基于层次关系图的消息传递机制，该机制允许文本和文本的表示相互更新，并可以动态选择共享相同文本的模糊实体。我们的实证结果表明，与其他知识增强模型相比，我们的模型能够有效地将KGs中的世界知识整合到现有的语言模型（如BERT）中，并且在机器阅读理解（MRC）任务上取得了显著的改进。摘要：Incorporating factual knowledge into pre-trained language models (PLM) such as BERT is an emerging trend in recent NLP studies. However, most of the existing methods combine the external knowledge integration module with a modified pre-training loss and re-implement the pre-training process on the large-scale corpus. Re-pretraining these models is usually resource-consuming, and difficult to adapt to another domain with a different knowledge graph (KG). Besides, those works either cannot embed knowledge context dynamically according to textual context or struggle with the knowledge ambiguity issue. In this paper, we propose a novel knowledge-aware language model framework based on fine-tuning process, which equips PLM with a unified knowledge-enhanced text graph that contains both text and multi-relational sub-graphs extracted from KG. We design a hierarchical relational-graph-based message passing mechanism, which can allow the representations of injected KG and text to mutually update each other and can dynamically select ambiguous mentioned entities that share the same text. Our empirical results show that our model can efficiently incorporate world knowledge from KGs into existing language models such as BERT, and achieve significant improvement on the machine reading comprehension (MRC) task compared with other knowledge-enhanced models.

【3】 TimeTraveler: Reinforcement Learning for Temporal Knowledge Graph Forecasting 标题：Timetraveler：强化学习在时态知识图预测中的应用链接：https://arxiv.org/abs/2109.04101

作者：Haohai Sun,Jialun Zhong,Yunpu Ma,Zhen Han,Kun He 机构： School of Computer Science and Technology, Huazhong University of Science and Technology, Institute of Informatics, LMU Munich , Corporate Technology, Siemens AG 备注：EMNLP 2021 摘要：时态知识图（TKG）推理是近年来获得越来越多研究兴趣的一项重要任务。现有的大多数方法侧重于在过去的时间戳上进行推理以完成缺失的事实，而在已知的TKG上进行推理以预测未来事实的工作则很少。与完成任务相比，预测任务更加困难，面临两个主要挑战：（1）如何有效地建模时间信息以处理未来的时间戳？（2）如何进行归纳推理来处理随着时间推移而出现的以前看不见的实体？为了应对这些挑战，我们提出了第一种用于预测的强化学习方法。具体来说，代理通过历史知识图快照来搜索答案。我们的方法定义了一个相对时间编码函数来捕获时间跨度信息，并设计了一种新的基于Dirichlet分布的时间型奖励来指导模型学习。此外，我们还提出了一种新的不可见实体表示方法，以提高模型的归纳推理能力。我们在未来的时间戳中评估我们的链路预测任务方法。在四个基准数据集上进行的大量实验表明，与现有的最新方法相比，性能有了显著提高，同时具有更高的解释性、更少的计算量和更少的参数。摘要：Temporal knowledge graph (TKG) reasoning is a crucial task that has gained increasing research interest in recent years. Most existing methods focus on reasoning at past timestamps to complete the missing facts, and there are only a few works of reasoning on known TKGs to forecast future facts. Compared with the completion task, the forecasting task is more difficult that faces two main challenges: (1) how to effectively model the time information to handle future timestamps? (2) how to make inductive inference to handle previously unseen entities that emerge over time? To address these challenges, we propose the first reinforcement learning method for forecasting. Specifically, the agent travels on historical knowledge graph snapshots to search for the answer. Our method defines a relative time encoding function to capture the timespan information, and we design a novel time-shaped reward based on Dirichlet distribution to guide the model learning. Furthermore, we propose a novel representation method for unseen entities to improve the inductive inference ability of the model. We evaluate our method for this link prediction task at future timestamps. Extensive experiments on four benchmark datasets demonstrate substantial performance improvement meanwhile with higher explainability, less calculation, and fewer parameters when compared with existing state-of-the-art methods.

【4】 A Three-Stage Learning Framework for Low-Resource Knowledge-Grounded Dialogue Generation 标题：基于知识的低资源对话生成的三阶段学习框架链接：https://arxiv.org/abs/2109.04096

作者：Shilei Liu,Xiaofeng Zhao,Bochao Li,Feiliang Ren,Longhui Zhang,Shujuan Yin 机构：School of Computer Science and Engineering, Key Laboratory of Medical Image Computing of Ministry of Education, Northeastern University, Shenyang, China 备注：Accepted by EMNLP 2021 main conference 摘要：神经会话模型通过引入外部背景知识，在产生流畅且信息丰富的回答方面显示出巨大的潜力。然而，构建这种以知识为基础的对话是很困难的，现有的模型在转移到训练样本有限的新领域时通常表现不佳。因此，在低资源环境下建立以知识为基础的对话体系仍然是一个关键问题。在本文中，我们提出了一种新的基于弱监督学习的三阶段学习框架，该框架得益于大规模的不固定对话和非结构化知识库。为了更好地与该框架合作，我们设计了一种带有解耦解码器的Transformer变体，它有助于响应生成和知识整合的解耦学习。对两个基准的评估结果表明，我们的方法可以在训练数据较少的情况下优于其他最先进的方法，即使在零资源的情况下，我们的方法仍然表现良好。摘要：Neural conversation models have shown great potentials towards generating fluent and informative responses by introducing external background knowledge. Nevertheless, it is laborious to construct such knowledge-grounded dialogues, and existing models usually perform poorly when transfer to new domains with limited training samples. Therefore, building a knowledge-grounded dialogue system under the low-resource setting is a still crucial issue. In this paper, we propose a novel three-stage learning framework based on weakly supervised learning which benefits from large scale ungrounded dialogues and unstructured knowledge base. To better cooperate with this framework, we devise a variant of Transformer with decoupled decoder which facilitates the disentangled learning of response generation and knowledge incorporation. Evaluation results on two benchmarks indicate that our approach can outperform other state-of-the-art methods with less training data, and even in zero-resource scenario, our approach still performs well.

【5】 Graphine: A Dataset for Graph-aware Terminology Definition Generation 标题：Graphine：用于图感知术语定义生成的数据集链接：https://arxiv.org/abs/2109.04018

作者：Zequn Liu,Shukai Wang,Yiyang Gu,Ruiyi Zhang,Ming Zhang,Sheng Wang 机构：Department of Computer Science, School of EECS, Peking University, Beijing, China, Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 备注：EMNLP 2021 摘要：准确定义术语是科学交流的第一步。开发用于定义生成的神经文本生成模型可以避免劳动强度管理，进一步加速科学发现。不幸的是，缺乏大规模术语定义数据集阻碍了定义生成的过程。在本文中，我们提出了一个大规模的术语定义数据集Graphine，涵盖2010648个术语定义对，跨越227个生物医学分支学科。每个分支学科中的术语进一步形成有向无环图，为开发图形感知文本生成模型开辟了新途径。然后，我们提出了一种新的图形感知定义生成模型Graphex，该模型将Transformer与图形神经网络相结合。通过利用术语的图形结构，我们的模型优于现有的文本生成模型。我们进一步展示了Graphine如何用于评估预训练语言模型、比较图形表示学习方法和预测句子粒度。我们设想Graphine是生物医学中定义生成和许多其他NLP任务的独特资源。摘要：Precisely defining the terminology is the first step in scientific communication. Developing neural text generation models for definition generation can circumvent the labor-intensity curation, further accelerating scientific discovery. Unfortunately, the lack of large-scale terminology definition dataset hinders the process toward definition generation. In this paper, we present a large-scale terminology definition dataset Graphine covering 2,010,648 terminology definition pairs, spanning 227 biomedical subdisciplines. Terminologies in each subdiscipline further form a directed acyclic graph, opening up new avenues for developing graph-aware text generation models. We then proposed a novel graph-aware definition generation model Graphex that integrates transformer with graph neural network. Our model outperforms existing text generation models by exploiting the graph structure of terminologies. We further demonstrated how Graphine can be used to evaluate pretrained language models, compare graph representation learning methods and predict sentence granularity. We envision Graphine to be a unique resource for definition generation and many other NLP tasks in biomedicine.

【6】 Graph Based Network with Contextualized Representations of Turns in Dialogue 标题：对话中话轮的上下文表示的基于图的网络链接：https://arxiv.org/abs/2109.04008

作者：Bongseok Lee,Yong Suk Choi 机构：Department of Computer Science, Hanyang University, Seoul, Korea 备注：EMNLP 2021 摘要：基于对话的关系提取（RE）旨在提取对话中出现的两个参数之间的关系。由于对话具有人称代词出现率高、信息密度低的特点，而且对话中的大多数关系事实都不受任何句子的支持，因此基于对话的关系抽取需要对对话进行全面的理解。在本文中，我们通过关注人们理解对话的方式，提出了回合上下文感知图卷积网络（TUCORE-GCN）。此外，我们还提出了一种新的方法，将会话中的情绪识别任务（ERC）视为基于对话的RE。在一个基于对话的RE数据集和三个ERC数据集上的实验表明，我们的模型在各种基于对话的自然语言理解任务中是非常有效的。在这些实验中，TUCORE-GCN在大多数基准数据集上都优于最先进的模型。我们的代码可在https://github.com/BlackNoodle/TUCORE-GCN. 摘要：Dialogue-based relation extraction (RE) aims to extract relation(s) between two arguments that appear in a dialogue. Because dialogues have the characteristics of high personal pronoun occurrences and low information density, and since most relational facts in dialogues are not supported by any single sentence, dialogue-based relation extraction requires a comprehensive understanding of dialogue. In this paper, we propose the TUrn COntext awaRE Graph Convolutional Network (TUCORE-GCN) modeled by paying attention to the way people understand dialogues. In addition, we propose a novel approach which treats the task of emotion recognition in conversations (ERC) as a dialogue-based RE. Experiments on a dialogue-based RE dataset and three ERC datasets demonstrate that our model is very effective in various dialogue-based natural language understanding tasks. In these experiments, TUCORE-GCN outperforms the state-of-the-art models on most of the benchmark datasets. Our code is available at https://github.com/BlackNoodle/TUCORE-GCN.

【7】 Unsupervised Pre-training with Structured Knowledge for Improving Natural Language Inference 标题：基于结构化知识的无监督预训练改进自然语言推理链接：https://arxiv.org/abs/2109.03941

作者：Xiaoyu Yang,Xiaodan Zhu,Zhan Shi,Tianda Li 机构：ECE Department, Queen’s University, Canada 摘要：尽管近年来对自然语言推理的研究从大量的注释数据集中获益匪浅，但注释数据中提供的推理相关知识（包括常识）数量仍然相当有限。有两种方法可以用来进一步解决这一局限性：（1）无监督的预训练可以在更大的非结构化文本数据中利用知识；（2）基于神经网络的NLI模型已经开始考虑结构化（通常是人类策划的）知识。一个紧迫的问题是这两种方法是否相互补充，或者如何开发能够将它们的优势结合在一起的模型。在本文中，我们提出了在预训练模型的不同组件中利用结构化知识的模型。我们的结果表明，所提出的模型比以前基于BERT的最新模型具有更好的性能。虽然我们的模型是针对NLI提出的，但它们可以很容易地扩展到其他句子或句子对分类问题。摘要：While recent research on natural language inference has considerably benefited from large annotated datasets, the amount of inference-related knowledge (including commonsense) provided in the annotated data is still rather limited. There have been two lines of approaches that can be used to further address the limitation: (1) unsupervised pretraining can leverage knowledge in much larger unstructured text data; (2) structured (often human-curated) knowledge has started to be considered in neural-network-based models for NLI. An immediate question is whether these two approaches complement each other, or how to develop models that can bring together their advantages. In this paper, we propose models that leverage structured knowledge in different components of pre-trained models. Our results show that the proposed models perform better than previous BERT-based state-of-the-art models. Although our models are proposed for NLI, they can be easily extended to other sentence or sentence-pair classification problems.

【8】 Knowledge mining of unstructured information: application to cyber-domain 标题：非结构化信息的知识挖掘：在网络领域的应用链接：https://arxiv.org/abs/2109.03848

作者：Tuomas Takko,Kunal Bhattacharya,Martti Lehto,Pertti Jalasvirta,Aapo Cederberg,Kimmo Kaski 机构：Department of Computer Science, Aalto University School of Science, Department of Industrial Engineering and Management, University of Jyväskylä, PO Box , Finland, Cyberwatch Finland Oy, Tietokuja , Finland, The Alan Turing Institute 摘要：网络情报在许多公开的在线来源中广泛而丰富，并有关于漏洞和事件的报告。这种持续不断的嘈杂信息流需要新的工具和技术，才能使各种组织的分析师和调查人员受益。在本文中，我们提出并实现了一个新的知识图和知识挖掘框架，用于从网络领域事件的自由文本中提取相关信息。我们的框架包括一个基于机器学习的管道，以及使用我们的非技术性网络本体生成实体图、攻击者和相关信息的爬行方法。我们在公开的网络事件数据集上测试我们的框架，以评估我们的知识挖掘方法的准确性以及框架在网络分析师使用中的有用性。我们的结果表明，分析使用新框架构建的知识图，分析师可以从当前网络环境中推断出更多信息，包括对不同实体的风险以及行业和国家之间的风险传播。扩展框架以容纳更多技术和操作层面的信息可以提高知识图中趋势和风险的准确性和可解释性。摘要：Cyber intelligence is widely and abundantly available in numerous open online sources with reports on vulnerabilities and incidents. This constant stream of noisy information requires new tools and techniques if it is to be used for the benefit of analysts and investigators in various organizations. In this paper we present and implement a novel knowledge graph and knowledge mining framework for extracting relevant information from free-form text about incidents in the cyber domain. Our framework includes a machine learning based pipeline as well as crawling methods for generating graphs of entities, attackers and the related information with our non-technical cyber ontology. We test our framework on publicly available cyber incident datasets to evaluate the accuracy of our knowledge mining methods as well as the usefulness of the framework in the use of cyber analysts. Our results show analyzing the knowledge graph constructed using the novel framework, an analyst can infer additional information from the current cyber landscape in terms of risk to various entities and the propagation of risk between industries and countries. Expanding the framework to accommodate more technical and operational level information can increase the accuracy and explainability of trends and risk in the knowledge graph.

【9】 Powering Comparative Classification with Sentiment Analysis via Domain Adaptive Knowledge Transfer 标题：基于领域自适应知识转移的情感分析增强比较分类能力链接：https://arxiv.org/abs/2109.03819

作者：Zeyu Li,Yilong Qin,Zihan Liu,Wei Wang 机构：Department of Compute Science, University of California, Los Angeles 备注：13 pages; EMNLP-2021 Main Conference 摘要：我们研究比较偏好分类（CPC），其目的是预测给定句子中两个实体之间是否存在偏好比较，如果存在，哪个实体比另一个实体更优先。高质量的CPC模型可以大大有利于应用程序，如比较性问题回答和基于评论的建议。在现有的方法中，非深度学习方法的性能较差。基于最先进图形神经网络的ED-GAT（Ma等人，2020）只考虑句法信息，而忽略了关键语义关系和对比较实体的情感。我们提出了情感分析增强比较网络（SAECON），该网络使用情感分析器通过领域自适应知识转移将情感学习到单个实体，从而提高了CPC的准确性。在CompSent-19（Panchenko et al.，2019）数据集上的实验表明，与现有的最佳CPC方法相比，F1成绩有了显著提高。摘要：We study Comparative Preference Classification (CPC) which aims at predicting whether a preference comparison exists between two entities in a given sentence and, if so, which entity is preferred over the other. High-quality CPC models can significantly benefit applications such as comparative question answering and review-based recommendations. Among the existing approaches, non-deep learning methods suffer from inferior performances. The state-of-the-art graph neural network-based ED-GAT (Ma et al., 2020) only considers syntactic information while ignoring the critical semantic relations and the sentiments to the compared entities. We proposed sentiment Analysis Enhanced COmparative Network (SAECON) which improves CPC ac-curacy with a sentiment analyzer that learns sentiments to individual entities via domain adaptive knowledge transfer. Experiments on the CompSent-19 (Panchenko et al., 2019) dataset present a significant improvement on the F1 scores over the best existing CPC approaches.

摘要|信息提取(3篇)

【1】 ARMAN: Pre-training with Semantically Selecting and Reordering of Sentences for Persian Abstractive Summarization 标题：ARMAN：用于波斯语摘要的句子语义选择和重新排序的预训练链接：https://arxiv.org/abs/2109.04098

作者：Alireza Salemi,Emad Kebriaei,Ghazal Neisi Minaei,Azadeh Shakery 机构：School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran, School of Computer Science, Institute for Research in Fundamental Sciences (IPM), Iran 摘要：摘要文本摘要是受预先训练的语言模型影响的领域之一。目前的摘要预训练工作更多地关注与正文共同的词较多的摘要，而较少关注生成的句子与原始文档之间的语义相似性。为了解决这个问题，我们提出了基于Transformer的预训练编码器-解码器模型ARMAN。在ARMAN中，根据修改后的语义分数选择文档中的显著句子，以隐藏并形成伪摘要。为了更准确地总结人类的写作模式，我们采用了修正的句子重排。我们在六个下游波斯总结任务上评估了我们提出的模型。实验结果表明，我们提出的模型在通过ROUGE和BERTScore测量的所有六个摘要任务上都达到了最先进的性能。我们的模型在文本蕴涵、问题释义和多项选择题回答方面也优于以前的工作。最后，我们建立了一个人类评估，并表明使用语义评分可以显著提高摘要结果。摘要：Abstractive text summarization is one of the areas influenced by the emergence of pre-trained language models. Current pre-training works in abstractive summarization give more points to the summaries with more words in common with the main text and pay less attention to the semantic similarity between generated sentences and the original document. We propose ARMAN, a Transformer-based encoder-decoder model pre-trained with three novel objectives to address this issue. In ARMAN, salient sentences from a document are selected according to a modified semantic score to be masked and form a pseudo summary. To summarize more accurately and similar to human writing patterns, we applied modified sentence reordering. We evaluated our proposed models on six downstream Persian summarization tasks. Experimental results show that our proposed model achieves state-of-the-art performance on all six summarization tasks measured by ROUGE and BERTScore. Our models also outperform prior works in textual entailment, question paraphrasing, and multiple choice question answering. Finally, we established a human evaluation and show that using the semantic score significantly improves summarization results.

【2】 Low-Resource Dialogue Summarization with Domain-Agnostic Multi-Source Pretraining 标题：基于领域不可知的多源预训练的低资源对话摘要链接：https://arxiv.org/abs/2109.04080

作者：Yicheng Zou,Bolin Zhu,Xingwu Hu,Tao Gui,Qi Zhang 机构：Institute of Modern Languages and Linguistics, Fudan University, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University, School of Computer Science, Fudan University, Shanghai, China 备注：Accepted by EMNLP 2021, 12 pages 摘要：随着日常生活中对话数据量的迅速增加，对对话摘要的需求也越来越大。不幸的是，由于带有注释摘要的对话数据不足，训练大型摘要模型通常是不可行的。大多数现有的低资源对话摘要的工作直接针对其他领域的训练模型，例如新闻领域，但是他们通常忽略了对话和传统文章之间的巨大差异。为了弥补域外预训练和域内微调之间的差距，在这项工作中，我们提出了一种多源预训练范例，以更好地利用外部摘要数据。具体来说，我们利用大规模的域内非摘要数据来分别预训练对话编码器和摘要解码器。然后，使用对抗性评论对组合编码器-解码器模型在域外摘要数据上进行预训练，以促进域不可知摘要。在两个公共数据集上的实验结果表明，在有限的训练数据下，我们的方法获得了有竞争力的性能，并且在不同的对话场景中具有良好的通用性。摘要：With the rapid increase in the volume of dialogue data from daily life, there is a growing demand for dialogue summarization. Unfortunately, training a large summarization model is generally infeasible due to the inadequacy of dialogue data with annotated summaries. Most existing works for low-resource dialogue summarization directly pretrain models in other domains, e.g., the news domain, but they generally neglect the huge difference between dialogues and conventional articles. To bridge the gap between out-of-domain pretraining and in-domain fine-tuning, in this work, we propose a multi-source pretraining paradigm to better leverage the external summary data. Specifically, we exploit large-scale in-domain non-summary data to separately pretrain the dialogue encoder and the summary decoder. The combined encoder-decoder model is then pretrained on the out-of-domain summary data using adversarial critics, aiming to facilitate domain-agnostic summarization. The experimental results on two public datasets show that with only limited training data, our approach achieves competitive performance and generalizes well in different dialogue scenarios.

【3】 Sparsity and Sentence Structure in Encoder-Decoder Attention of Summarization Systems 标题：摘要系统编解码器注意事项中的稀疏性和句子结构链接：https://arxiv.org/abs/2109.03888

作者：Potsawee Manakul,Mark J. F. Gales 机构：Department of Engineering, University of Cambridge 备注：EMNLP 2021 (short paper, main conference) 摘要：Transformer模型在包括摘要在内的一系列NLP任务中取得了最新成果。使用大型Transformer模型进行训练和推理在计算上可能会很昂贵。以前的工作集中在一个重要的瓶颈，编码器中的二次自我注意机制。改进的编码器架构（如LED或LoBART）使用局部注意模式来解决这个问题，以便进行总结。相比之下，这项工作的重点是Transformer的编码器-解码器注意机制。这种关注的代价在需要模型生成历史的推理或训练方法中变得更为重要。首先，我们检查编码器-解码器的复杂性。我们从经验上证明，文档摘要中存在一种稀疏的句子结构，可以通过将注意机制限制在输入句子的子集上加以利用，同时保持系统性能。其次，我们提出了一种改进的结构，该结构选择句子子集来约束编解码器的注意力。实验在抽象摘要任务上进行，包括CNN/DailyMail、XSum、Spotify播客和arXiv。摘要：Transformer models have achieved state-of-the-art results in a wide range of NLP tasks including summarization. Training and inference using large transformer models can be computationally expensive. Previous work has focused on one important bottleneck, the quadratic self-attention mechanism in the encoder. Modified encoder architectures such as LED or LoBART use local attention patterns to address this problem for summarization. In contrast, this work focuses on the transformer's encoder-decoder attention mechanism. The cost of this attention becomes more significant in inference or training approaches that require model-generated histories. First, we examine the complexity of the encoder-decoder attention. We demonstrate empirically that there is a sparse sentence structure in document summarization that can be exploited by constraining the attention mechanism to a subset of input sentences, whilst maintaining system performance. Second, we propose a modified architecture that selects the subset of sentences to constrain the encoder-decoder attention. Experiments are carried out on abstractive summarization tasks, including CNN/DailyMail, XSum, Spotify Podcast, and arXiv.

推理|分析|理解|解释(3篇)

【1】 Analysis of Language Change in Collaborative Instruction Following 标题：合作教学跟随中的语言变化分析链接：https://arxiv.org/abs/2109.04452

作者：Anna Effenberger,Eva Yan,Rhia Singh,Alane Suhr,Yoav Artzi 机构：Cornell University, City University of New York 备注：Findings of EMNLP 2021 Short Paper 摘要：我们分析了在协作、目标导向的教学任务中，随着时间的推移语言的变化，其中效用最大化的参与者形成惯例并增加他们的专业知识。先前的工作主要在参考游戏的背景下研究这类场景，并一致发现，随着约定的形成，语言复杂度在多个维度（如话语长度）上降低。相比之下，我们发现，考虑到提高教学效用的能力，教师在这些先前研究的维度上增加了语言复杂性，以便更好地与越来越熟练的教学追随者合作。摘要：We analyze language change over time in a collaborative, goal-oriented instructional task, where utility-maximizing participants form conventions and increase their expertise. Prior work studied such scenarios mostly in the context of reference games, and consistently found that language complexity is reduced along multiple dimensions, such as utterance length, as conventions are formed. In contrast, we find that, given the ability to increase instruction utility, instructors increase language complexity along these previously studied dimensions to better collaborate with increasingly skilled instruction followers.

【2】 Avoiding Inference Heuristics in Few-shot Prompt-based Finetuning 标题：基于Few-Shot提示的精调避免推理启发式链接：https://arxiv.org/abs/2109.04144

作者：Prasetya Ajie Utama,Nafise Sadat Moosavi,Victor Sanh,Iryna Gurevych 机构：†Research Training Group AIPHES, ‡UKP Lab, Technische Universität Darmstadt, ♣Hugging Face, Brooklyn, USA 备注：Accepted at EMNLP 2021 摘要：最新的基于提示的方法允许预先训练的语言模型通过将下游任务重新格式化为语言建模问题，在几次微调上实现强大的性能。在这项工作中，我们证明，尽管基于提示的精细句子对分类模型在低数据率方面具有优势，但它仍然存在采用基于词汇重叠的推理启发法的常见缺陷，例如。，模型错误地假设句子对具有相同的含义，因为它们由相同的词组成。有趣的是，我们发现在基于提示的模型的Few-Shot评估中，这种特殊的推理启发明显较少，这表明微调如何破坏在预训练期间学习的有用知识。然后，我们证明，添加一个保持预训练权重的正则化可以有效地缓解这种Few-Shot微调的破坏性趋势。我们对三个数据集的评估表明，在用于诊断推理启发式的三个相应的挑战数据集上，有很大的改进。摘要：Recent prompt-based approaches allow pretrained language models to achieve strong performances on few-shot finetuning by reformulating downstream tasks as a language modeling problem. In this work, we demonstrate that, despite its advantages on low data regimes, finetuned prompt-based models for sentence pair classification tasks still suffer from a common pitfall of adopting inference heuristics based on lexical overlap, e.g., models incorrectly assuming a sentence pair is of the same meaning because they consist of the same set of words. Interestingly, we find that this particular inference heuristic is significantly less present in the zero-shot evaluation of the prompt-based model, indicating how finetuning can be destructive to useful knowledge learned during the pretraining. We then show that adding a regularization that preserves pretraining weights is effective in mitigating this destructive tendency of few-shot finetuning. Our evaluation on three datasets demonstrates promising improvements on the three corresponding challenge datasets used to diagnose the inference heuristics.

【3】 Debiasing Methods in Natural Language Understanding Make Bias More Accessible 标题：自然语言理解中的去偏方法使偏见更容易获得链接：https://arxiv.org/abs/2109.04095

作者：Michael Mendelson,Yonatan Belinkov 机构：Technion – Israel Institute of Technology 备注：Accepted at EMNLP 2021 摘要：模型对偏差的鲁棒性通常取决于对精心设计的分布外数据集的泛化。自然语言理解（NLU）中的最新debiasing方法通过迫使模型做出无偏预测来提高这类数据集的性能。这种方法背后的一个基本假设是，这也会导致在模型的内部表示中发现更健壮的特征。我们提出了一个通用的基于探测的框架，该框架允许对语言模型中的偏见进行事后解释，并使用信息论方法测量从模型表示中提取某些偏见的能力。我们对几个NLU数据集和已知的偏差进行了实验，结果表明，与直觉相反，语言模型越倾向于Debiase模式，其内部表示中实际编码的偏差就越多。摘要：Model robustness to bias is often determined by the generalization on carefully designed out-of-distribution datasets. Recent debiasing methods in natural language understanding (NLU) improve performance on such datasets by pressuring models into making unbiased predictions. An underlying assumption behind such methods is that this also leads to the discovery of more robust features in the model's inner representations. We propose a general probing-based framework that allows for post-hoc interpretation of biases in language models, and use an information-theoretic approach to measure the extractability of certain biases from the model's representations. We experiment with several NLU datasets and known biases, and show that, counter-intuitively, the more a language model is pushed towards a debiased regime, the more bias is actually encoded in its inner representations.

GAN|对抗|攻击|生成相关(3篇)

【1】 Contrasting Human- and Machine-Generated Word-Level Adversarial Examples for Text Classification 标题：用于文本分类的人工生成和机器生成的词级对抗性示例对比链接：https://arxiv.org/abs/2109.04385

作者：Maximilian Mozes,Max Bartolo,Pontus Stenetorp,Bennett Kleinberg,Lewis D. Griffin 机构：University College London, Tilburg University 备注：EMNLP 2021 摘要：研究表明，人们普遍认为自然语言处理模型容易受到敌对攻击；但最近的工作已经引起了人们对根据某些标准（例如，保留语义和语法性）验证这些对抗性输入的关注。强制执行约束以支持此类标准可能会导致攻击失败，从而引发有效攻击是否实际可行的问题。在这项工作中，我们通过人类语言能力的视角来研究这一点。我们报告了众包研究，其中我们要求人类迭代修改输入文本中的单词，同时接收即时模型反馈，目的是导致情绪分类模型对示例进行错误分类。我们的研究结果表明，人类能够使用保留语义的词语替换产生大量的对抗性例子。我们从自然度、情感保留、语法性和替换率等维度分析了人工生成的对抗性示例与最近提出的TextWiller、Genetic、BAE和Semepso攻击算法的比较。我们的研究结果表明，人类生成的对抗性示例并不比最佳算法更能生成自然阅读、情感保留的示例，尽管它们的计算效率更高。摘要：Research shows that natural language processing models are generally considered to be vulnerable to adversarial attacks; but recent work has drawn attention to the issue of validating these adversarial inputs against certain criteria (e.g., the preservation of semantics and grammaticality). Enforcing constraints to uphold such criteria may render attacks unsuccessful, raising the question of whether valid attacks are actually feasible. In this work, we investigate this through the lens of human language ability. We report on crowdsourcing studies in which we task humans with iteratively modifying words in an input text, while receiving immediate model feedback, with the aim of causing a sentiment classification model to misclassify the example. Our findings suggest that humans are capable of generating a substantial amount of adversarial examples using semantics-preserving word substitutions. We analyze how human-generated adversarial examples compare to the recently proposed TextFooler, Genetic, BAE and SememePSO attack algorithms on the dimensions naturalness, preservation of sentiment, grammaticality and substitution rate. Our findings suggest that human-generated adversarial examples are not more able than the best algorithms to generate natural-reading, sentiment-preserving examples, though they do so by being much more computationally efficient.

【2】 Multi-granularity Textual Adversarial Attack with Behavior Cloning 标题：基于行为克隆的多粒度文本对抗攻击链接：https://arxiv.org/abs/2109.04367

作者：Yangyi Chen,Jin Su,Wei Wei 机构：Cognitive Computing and Intelligent Information Processing Laboratory, School of Computer, Science and Technology Huazhong University of Science and Tehchnology, School of Software Engineering, Huazhong University of Science and Tehchnology 备注：Accepted by the main conference of EMNLP 2021 摘要：近年来，文本对抗攻击模型由于能够成功地估计NLP模型的鲁棒性而越来越流行。然而，现有的工作有明显的不足。（1）他们通常只考虑修改策略的单一粒度（例如，词级或句子级），这是不足以探索整体的篇章空间来生成的；（2）他们需要数百次查询受害者模型才能成功进行攻击，这在实践中效率很低。为了解决这些问题，本文提出了一种多粒度攻击模型MAYA，它可以有效地生成高质量的对抗性样本，而对受害者模型的查询更少。此外，我们提出了一种基于强化学习的方法，利用MAYA算法中的专家知识，通过行为克隆来训练多粒度攻击代理，以进一步减少查询时间。此外，我们还使代理适应于攻击只输出标签而不输出置信度分数的黑盒模型。我们通过在两种不同的黑盒攻击设置和三个基准数据集中攻击BiLSTM、BERT和RoBERTa，进行综合实验来评估我们的攻击模型。实验结果表明，与基线模型相比，我们的模型总体上取得了更好的攻击性能，并产生了更流畅、更符合语法的对抗性样本。此外，我们的对抗式攻击代理显著减少了两种攻击设置下的查询时间。我们的代码发布于https://github.com/Yangyi-Chen/MAYA. 摘要：Recently, the textual adversarial attack models become increasingly popular due to their successful in estimating the robustness of NLP models. However, existing works have obvious deficiencies. (1) They usually consider only a single granularity of modification strategies (e.g. word-level or sentence-level), which is insufficient to explore the holistic textual space for generation; (2) They need to query victim models hundreds of times to make a successful attack, which is highly inefficient in practice. To address such problems, in this paper we propose MAYA, a Multi-grAnularitY Attack model to effectively generate high-quality adversarial samples with fewer queries to victim models. Furthermore, we propose a reinforcement-learning based method to train a multi-granularity attack agent through behavior cloning with the expert knowledge from our MAYA algorithm to further reduce the query times. Additionally, we also adapt the agent to attack black-box models that only output labels without confidence scores. We conduct comprehensive experiments to evaluate our attack models by attacking BiLSTM, BERT and RoBERTa in two different black-box attack settings and three benchmark datasets. Experimental results show that our models achieve overall better attacking performance and produce more fluent and grammatical adversarial samples compared to baseline models. Besides, our adversarial attack agent significantly reduces the query times in both attack settings. Our codes are released at https://github.com/Yangyi-Chen/MAYA.

【3】 Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models 标题：检索、字幕、生成：文本生成模型中增强常识性的视觉基础链接：https://arxiv.org/abs/2109.03892

作者：Steven Y. Feng,Kevin Lu,Zhuofu Tao,Malihe Alikhani,Teruko Mitamura,Eduard Hovy,Varun Gangal 机构：Carnegie Mellon University,University of Waterloo, University of California Los Angeles,University of Pittsburgh 摘要：我们研究了使用图像中包含的多模态信息作为一种有效的方法来增强用于文本生成的Transformer模型的常识。我们使用BART和T5对概念到文本的生成进行了实验，特别是生成性常识推理（CommonGen）的任务。我们将我们的方法称为VisCTG：基于视觉的文本生成概念。VisCTG包括为代表适当日常场景的图像添加字幕，并使用这些字幕来丰富和指导生成过程。综合评估和分析表明，VisCTG显著提高了模型性能，同时成功地解决了基线生成的几个问题，包括常识性差、流利性和特异性。摘要：We investigate the use of multimodal information contained in images as an effective method for enhancing the commonsense of Transformer models for text generation. We perform experiments using BART and T5 on concept-to-text generation, specifically the task of generative commonsense reasoning, or CommonGen. We call our approach VisCTG: Visually Grounded Concept-to-Text Generation. VisCTG involves captioning images representing appropriate everyday scenarios, and using these captions to enrich and steer the generation process. Comprehensive evaluation and analysis demonstrate that VisCTG noticeably improves model performance while successfully addressing several issues of the baseline generations, including poor commonsense, fluency, and specificity.

半/弱/无监督|不确定性(4篇)

【1】 ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding 标题：ESimCSE：无监督语句嵌入对比学习的增强型样本构建方法链接：https://arxiv.org/abs/2109.04380

作者：Xing Wu,Chaochen Gao,Liangjun Zang,Jizhong Han,Zhongyuan Wang,Songlin Hu 机构：Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Kuaishou Technology, Beijing, China 备注：9 pages, 2 figures 摘要：对比学习因其在无监督句中的嵌入而备受关注。目前最先进的无监督方法是无监督SimCSE（unsup SimCSE）。Unsup SimCSE将dropout作为最小的数据扩充方法，并将相同的输入语句传递给预训练的Transformer编码器（打开dropout）两次，以获得两个相应的嵌入，以构建正对。由于在转换器中使用位置嵌入，句子的长度信息通常会被编码到句子嵌入中，因此unsup SimCSE中的每个正对实际上包含相同的长度信息。因此，用这些阳性对训练的UnSuPS-SIMCSE可能是偏颇的，这将倾向于认为相同或相似长度的句子在语义上更相似。通过统计观察，我们发现unsup SimCSE确实存在这样的问题。为了缓解这种情况，我们使用一个简单的重复操作来修改输入句子，然后将输入句子和修改后的句子分别传递给预训练的Transformer编码器，以获得正对。此外，我们从计算机视觉社区中汲取灵感，引入动量对比，在不进行额外计算的情况下扩大负对的数量。提出的两种修改分别应用于正负对，并构建了一种新的句子嵌入方法，称为增强型unsp-SimCSE（ESimCSE）。我们通过语义文本相似性（STS）任务，在几个基准数据集上对所提出的ESimCSE进行了评估。实验结果表明，ESimCSE在BERT基础上的平均Spearman相关性为2.02%，优于最先进的unsup SimCSE。摘要：Contrastive learning has been attracting much attention for learning unsupervised sentence embeddings. The current state-of-the-art unsupervised method is the unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE takes dropout as a minimal data augmentation method, and passes the same input sentence to a pre-trained Transformer encoder (with dropout turned on) twice to obtain the two corresponding embeddings to build a positive pair. As the length information of a sentence will generally be encoded into the sentence embeddings due to the usage of position embedding in Transformer, each positive pair in unsup-SimCSE actually contains the same length information. And thus unsup-SimCSE trained with these positive pairs is probably biased, which would tend to consider that sentences of the same or similar length are more similar in semantics. Through statistical observations, we find that unsup-SimCSE does have such a problem. To alleviate it, we apply a simple repetition operation to modify the input sentence, and then pass the input sentence and its modified counterpart to the pre-trained Transformer encoder, respectively, to get the positive pair. Additionally, we draw inspiration from the community of computer vision and introduce a momentum contrast, enlarging the number of negative pairs without additional calculations. The proposed two modifications are applied on positive and negative pairs separately, and build a new sentence embedding method, termed Enhanced Unsup-SimCSE (ESimCSE). We evaluate the proposed ESimCSE on several benchmark datasets w.r.t the semantic text similarity (STS) task. Experimental results show that ESimCSE outperforms the state-of-the-art unsup-SimCSE by an average Spearman correlation of 2.02% on BERT-base.

【2】 Uncertainty Measures in Neural Belief Tracking and the Effects on Dialogue Policy Performance 标题：神经信念跟踪中的不确定性度量及其对对话策略绩效的影响链接：https://arxiv.org/abs/2109.04349

作者：Carel van Niekerk,Andrey Malinin,Christian Geishauser,Michael Heck,Hsien-chin Lin,Nurul Lubis,Shutong Feng,Milica Gašić 机构：Heinrich Heine Universität Düsseldorf, Düsseldorf, Germany, Yandex Research and HSE University, Moscow, Russia 备注：14 pages, 2 figures, accepted at EMNLP 2021 Main conference, Code at: this https URL 摘要：识别和解决不确定性的能力对于对话系统的稳健性至关重要。事实上，这已经在使用贝叶斯方法进行对话信念跟踪的系统上得到了实证证实。然而，这样的系统只考虑置信度估计，并且难以缩放到更复杂的设置。另一方面，神经对话系统很少考虑不确定性。因此，他们对自己的决定过于自信，缺乏活力。此外，通常单独评估跟踪任务的性能，而不考虑其对下游政策优化的影响。我们建议在神经信念跟踪中使用不同的不确定性度量。通过与用户模拟器交互，向策略和训练策略的特征空间添加选定的不确定性度量，评估这些度量对策略优化下游任务的影响。人工和模拟用户结果均表明，采用这些措施可提高下游对话策略的性能和鲁棒性。这突出了开发考虑不确定性的神经对话信念跟踪器的重要性。摘要：The ability to identify and resolve uncertainty is crucial for the robustness of a dialogue system. Indeed, this has been confirmed empirically on systems that utilise Bayesian approaches to dialogue belief tracking. However, such systems consider only confidence estimates and have difficulty scaling to more complex settings. Neural dialogue systems, on the other hand, rarely take uncertainties into account. They are therefore overconfident in their decisions and less robust. Moreover, the performance of the tracking task is often evaluated in isolation, without consideration of its effect on the downstream policy optimisation. We propose the use of different uncertainty measures in neural belief tracking. The effects of these measures on the downstream task of policy optimisation are evaluated by adding selected measures of uncertainty to the feature space of the policy and training policies through interaction with a user simulator. Both human and simulated user results show that incorporating these measures leads to improvements both of the performance and of the robustness of the downstream dialogue policy. This highlights the importance of developing neural dialogue belief trackers that take uncertainty into account.

【3】 Smoothed Contrastive Learning for Unsupervised Sentence Embedding 标题：无监督句子嵌入的平滑对比学习链接：https://arxiv.org/abs/2109.04321

作者：Xing Wu,Chaochen Gao,Liangjun Zang,Jizhong Han,Zhongyuan Wang,Songlin Hu 机构：Institute of Information Engineering, Chinese Academy of Sciences, Beijing, China, School of Cyber Security, University of Chinese Academy of Sciences, Beijing, China, Kuaishou Technology, Beijing, China 备注：6 pages, 2 figures 摘要：对比学习已逐渐应用于学习高质量的无监督句子嵌入。在以前的非监督方法中，据我们所知，最新最先进的方法是无监督SimCSE（unsup SimCSE）。Unsup SimCSE在训练阶段使用InfoNCE1loss函数，将语义相似的句子拉到一起，并将不相似的句子分开。理论上，我们希望在Unsup SimCSE中使用更大的批次，以便在样本之间进行更充分的比较，并避免过度拟合。但是，增加批处理大小并不总是带来改进，反而会在批处理大小超过阈值时导致性能下降。通过统计观察，我们发现这可能是由于在增加批量后引入了低置信负对。为了缓解这个问题，我们在InfoNCE损失函数上引入了一种简单的平滑策略，称为高斯平滑InfoNCE（GS InfoNCE）。具体地说，我们添加随机高斯噪声向量作为负样本，作为负样本空间的平滑。虽然简单，建议的平滑化策略为unsup SimCSE带来了实质性的改进。我们在标准语义文本相似性（STS）任务中评估GS Infonces。GS InfoNCE在BERT-base、BERT-large、RoBERTa-base和RoBERTa-large的基础上的平均Spear-man相关性分别为1.38%、0.72%、1.17%和0.28%，优于最先进的UNSUSP SimCSE。摘要：Contrastive learning has been gradually applied to learn high-quality unsupervised sentence embedding. Among the previous un-supervised methods, the latest state-of-the-art method, as far as we know, is unsupervised SimCSE (unsup-SimCSE). Unsup-SimCSE uses the InfoNCE1loss function in the training stage by pulling semantically similar sentences together and pushing apart dis-similar ones.Theoretically, we expect to use larger batches in unsup-SimCSE to get more adequate comparisons among samples and avoid overfitting. However, increasing the batch size does not always lead to improvements, but instead even lead to performance degradation when the batch size exceeds a threshold. Through statistical observation, we find that this is probably due to the introduction of low-confidence negative pairs after in-creasing the batch size. To alleviate this problem, we introduce a simple smoothing strategy upon the InfoNCE loss function, termedGaussian Smoothing InfoNCE (GS-InfoNCE).Specifically, we add random Gaussian noise vectors as negative samples, which act asa smoothing of the negative sample space.Though being simple, the proposed smooth-ing strategy brings substantial improvements to unsup-SimCSE. We evaluate GS-InfoNCEon the standard semantic text similarity (STS)task. GS-InfoNCE outperforms the state-of-the-art unsup-SimCSE by an average Spear-man correlation of 1.38%, 0.72%, 1.17% and0.28% on the base of BERT-base, BERT-large,RoBERTa-base and RoBERTa-large, respectively.

【4】 Variational Latent-State GPT for Semi-supervised Task-Oriented Dialog Systems 标题：半监督面向任务对话系统的变分潜态GPT 链接：https://arxiv.org/abs/2109.04314

作者：Hong Liu,Yucheng Cai,Zhenru Lin,Zhijian Ou,Yi Huang,Junlan Feng 机构： Speech Processing and Machine Intelligence Lab, Tsinghua University, Beijing, China, China Mobile Research Institute, Beijing, China 摘要：最近，两种方法，微调大型预训练语言模型和变分训练，分别引起了半监督端到端面向任务对话（TOD）系统的极大兴趣。在本文中，我们提出了变分潜态GPT模型（VLS-GPT），这是第一个结合了这两种方法的优点。在众多模型选项中，我们提出了用于端到端TOD系统变分学习的生成模型和推理模型，这两种模型都是基于GPT-2的自回归语言模型，可以在标记和未标记的对话数据的混合上以半监督方式进一步训练。我们提出了先采样后前向计算的策略，成功地克服了GPT在变分学习中的内存爆炸问题，加快了训练速度。半监督TOD实验在两个不同语言的基准多域数据集MultiWOZ2.1和CrossWOZ上进行。VLS-GPT被证明显著优于仅监督和半监督基线。摘要：Recently, two approaches, fine-tuning large pre-trained language models and variational training, have attracted significant interests, separately, for semi-supervised end-to-end task-oriented dialog (TOD) systems. In this paper, we propose Variational Latent-State GPT model (VLS-GPT), which is the first to combine the strengths of the two approaches. Among many options of models, we propose the generative model and the inference model for variational learning of the end-to-end TOD system, both as auto-regressive language models based on GPT-2, which can be further trained over a mix of labeled and unlabeled dialog data in a semi-supervised manner. We develop the strategy of sampling-then-forward-computation, which successfully overcomes the memory explosion issue of using GPT in variational learning and speeds up training. Semi-supervised TOD experiments are conducted on two benchmark multi-domain datasets of different languages - MultiWOZ2.1 and CrossWOZ. VLS-GPT is shown to significantly outperform both supervised-only and semi-supervised baselines.

识别/分类(1篇)

【1】 Multilingual Speech Recognition for Low-Resource Indian Languages using Multi-Task conformer 标题：基于多任务整形器的低资源印度语种多语种语音识别链接：https://arxiv.org/abs/2109.03969

作者：Krishna D N 备注：5 pages. arXiv admin note: substantial text overlap with arXiv:2109.03277 摘要：Transformer最近在序列到序列的应用中非常流行，如机器翻译和语音识别。在这项工作中，我们提出了一个基于多任务学习的Transformer模型，用于印度语的低资源多语言语音识别。我们提出的模型由一个conformer[1]编码器和两个并行Transformer解码器组成。我们使用音素解码器（PHN-DEC）来完成音素识别任务，使用字形解码器（GRP-DEC）来预测字形序列。我们认为音素识别任务作为我们的多任务学习框架的辅助任务。我们使用联合CTC注意[2]训练联合优化音素和字形识别任务的网络。在预测字位序列之前，我们使用条件解码方案将语言信息注入模型。我们的实验表明，我们提出的方法比以前的方法有显著的改进[4]。我们还表明，基于构象的双解码器方法优于基于变换器的双解码器方法和单解码器方法。最后，我们比较了单语ASR模型和我们提出的多语言ASR方法。摘要：Transformers have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. In this work, we propose a multi-task learning-based transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a conformer [1] encoder and two parallel transformer decoders. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence. We consider the phoneme recognition task as an auxiliary task for our multi-task learning framework. We jointly optimize the network for both phoneme and grapheme recognition tasks using Joint CTC-Attention [2] training. We use a conditional decoding scheme to inject the language information into the model before predicting the grapheme sequence. Our experiments show that our proposed approach can obtain significant improvement over previous approaches [4]. We also show that our conformer-based dual-decoder approach outperforms both the transformer-based dual-decoder approach and single decoder approach. Finally, We compare monolingual ASR models with our proposed multilingual ASR approach.

Zero/Few/One-Shot|迁移|自适应(1篇)

【1】 PPT: Pre-trained Prompt Tuning for Few-shot Learning 标题：PPT：预先训练的快速调谐，可实现极少的学习链接：https://arxiv.org/abs/2109.04332

作者：Yuxian Gu,Xu Han,Zhiyuan Liu,Minlie Huang 机构：Department of Computer Science and Technology, Tsinghua University & BAAI 备注：10 pages, 4 figures 摘要：预训练语言模型（PLM）提示通过弥合预训练任务和各种下游任务之间的差距，显示出显著的性能。在这些方法中，prompt tuning冻结PLM，只调整软提示，为使大规模PLM适应下游任务提供了一种高效的解决方案。然而，即时调优尚未得到充分探索。在我们的试点实验中，我们发现，当下游数据足够时，快速调谐的性能与传统的全模型微调相当，而在少数镜头学习设置下，快速调谐的性能要差得多，这可能会阻碍快速调谐在实践中的应用。我们将这种低性能归因于初始化软提示的方式。因此，在这项工作中，我们建议通过在预训练阶段添加软提示来预训练提示，以获得更好的初始化效果。我们将这个经过预先训练的提示调优框架命名为“PPT”。为了保证PPT的通用性，我们将相似的分类任务制定成统一的任务形式，并为该统一任务预训练软提示。大量的实验表明，在全数据和少量镜头设置下，为下游任务调整预先训练的提示可以达到甚至优于全模型微调。我们的方法在实际应用中是有效的。摘要：Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks. Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks. However, prompt tuning is yet to be fully explored. In our pilot experiments, we find that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings, which may hinder the application of prompt tuning in practice. We attribute this low performance to the manner of initializing soft prompts. Therefore, in this work, we propose to pre-train prompts by adding soft prompts into the pre-training stage to obtain a better initialization. We name this Pre-trained Prompt Tuning framework "PPT". To ensure the generalization of PPT, we formulate similar classification tasks into a unified task form and pre-train soft prompts for this unified task. Extensive experiments show that tuning pre-trained prompts for downstream tasks can reach or even outperform full-model fine-tuning under both full-data and few-shot settings. Our approach is effective and efficient for using large-scale PLMs in practice.

Word2Vec|文本|单词(2篇)

【1】 Word-Level Coreference Resolution 标题：词级共指解析链接：https://arxiv.org/abs/2109.04127

作者：Vladimir Dobrovolskii 机构：ABBYY Moscow, Russia 备注：Accepted to EMNLP-2021 摘要：最近的共指消解模型在很大程度上依赖于跨度表示来发现词跨度之间的共指联系。由于文本长度中的跨距数为$O（n^2）$，而潜在链接数为$O（n^4）$，因此需要使用各种修剪技术使该方法在计算上可行。我们建议，而不是考虑词之间的联系词，而不是字跨度，然后重建字跨度。这降低了Co参考模型的复杂度到$O（n ^ 2）$，并且允许它考虑所有潜在的提及而不将其中任何一个剪掉。我们还证明，有了这些变化，SpanBERT的共指消解能力将大大优于RoBERTa。虽然效率很高，但我们的模型在OntoNotes基准上与最新的共指消解系统具有竞争力。摘要：Recent coreference resolution models rely heavily on span representations to find coreference links between word spans. As the number of spans is $O(n^2)$ in the length of text and the number of potential links is $O(n^4)$, various pruning techniques are necessary to make this approach computationally feasible. We propose instead to consider coreference links between individual words rather than word spans and then reconstruct the word spans. This reduces the complexity of the coreference model to $O(n^2)$ and allows it to consider all potential mentions without pruning any of them out. We also demonstrate that, with these changes, SpanBERT for coreference resolution will be significantly outperformed by RoBERTa. While being highly efficient, our model performs competitively with recent coreference resolution systems on the OntoNotes benchmark.

【2】 A Recipe For Arbitrary Text Style Transfer with Large Language Models 标题：使用大型语言模型进行任意文本样式转换的秘诀链接：https://arxiv.org/abs/2109.03910

作者：Emily Reif,Daphne Ippolito,Ann Yuan,Andy Coenen,Chris Callison-Burch,Jason Wei 机构：Google Research, University of Pennsylvania 摘要：在本文中，我们利用大型语言模型（LMs）来执行Zero-Shot文本样式转换。我们提出了一种称为增强Zero-Shot学习的提示方法，该方法将风格转换视为一个句子重写任务，只需要自然语言指令，而不需要对目标风格进行模型微调或示例。增强Zero-Shot学习非常简单，不仅在标准风格的转换任务（如情绪）上，而且在任意转换（如“使此情节戏剧化”或“插入隐喻”）上，都显示出有希望的结果摘要：In this paper, we leverage large language models (LMs) to perform zero-shot text style transfer. We present a prompting method that we call augmented zero-shot learning, which frames style transfer as a sentence rewriting task and requires only a natural language instruction, without model fine-tuning or exemplars in the target style. Augmented zero-shot learning is simple and demonstrates promising results not just on standard style transfer tasks such as sentiment, but also on arbitrary transformations such as "make this melodramatic" or "insert a metaphor."

其他神经网络|深度学习|模型|建模(7篇)

【1】 AStitchInLanguageModels: Dataset and Methods for the Exploration of Idiomaticity in Pre-Trained Language Models 标题：AStitchInLanguageModels：用于探索预训练语言模型中的相对性的数据集和方法链接：https://arxiv.org/abs/2109.04413

作者：Harish Tayyar Madabushi,Edward Gow-Smith,Carolina Scarton,Aline Villavicencio 机构：Department of Computer Science, University of Sheffield, United Kingdom 备注：Findings of EMNLP 2021. Code available at: this https URL 摘要：尽管在各种自然语言处理任务中取得了成功，但预先训练的语言模型由于严重依赖于组合性，无法有效地捕捉多词表达（MWE）的含义，尤其是习语。因此，迫切需要改进MWE表示的数据集和方法。现有数据集仅限于提供表达式的惯用程度以及MWE的文字解释和（如适用）单一的非文字解释。这项工作提出了一个新的自然出现的句子数据集，其中包含手动分类为细粒度意义集的MWE，跨越英语和葡萄牙语。我们将此数据集用于两项任务，旨在测试i）语言模型检测习语使用的能力，以及ii）语言模型生成包含习语的句子表示的有效性。我们的实验表明，在检测惯用用法的任务上，这些模型在单镜头和Few-Shot场景中表现得相当好，但在Zero-Shot场景中有很大的改进空间。在表达习语性的任务上，我们发现预训练并不总是有效的，而微调可以提供一种学习包含MWE的句子表达的有效方法。摘要：Despite their success in a variety of NLP tasks, pre-trained language models, due to their heavy reliance on compositionality, fail in effectively capturing the meanings of multiword expressions (MWEs), especially idioms. Therefore, datasets and methods to improve the representation of MWEs are urgently needed. Existing datasets are limited to providing the degree of idiomaticity of expressions along with the literal and, where applicable, (a single) non-literal interpretation of MWEs. This work presents a novel dataset of naturally occurring sentences containing MWEs manually classified into a fine-grained set of meanings, spanning both English and Portuguese. We use this dataset in two tasks designed to test i) a language model's ability to detect idiom usage, and ii) the effectiveness of a language model in generating representations of sentences containing idioms. Our experiments demonstrate that, on the task of detecting idiomatic usage, these models perform reasonably well in the one-shot and few-shot scenarios, but that there is significant scope for improvement in the zero-shot scenario. On the task of representing idiomaticity, we find that pre-training is not always effective, while fine-tuning could provide a sample efficient method of learning representations of sentences containing MWEs.

【2】 Learning from Uneven Training Data: Unlabeled, Single Label, and Multiple Labels 标题：从不均匀的训练数据中学习：无标签、单标签和多标签链接：https://arxiv.org/abs/2109.04408

作者：Shujian Zhang,Chengyue Gong,Eunsol Choi 机构：The University of Texas at Austin 备注：EMNLP 2021; Our code is publicly available at this https URL 摘要：训练NLP系统通常假设访问每个示例具有单个人类标签的带注释数据。鉴于注释者的不完全标注和语言固有的歧义，我们假设单一标注不足以了解语言解释的范围。我们探索了新的标签注释分布方案，为每个示例分配多个标签，用于训练示例的一小部分。以注释较少的示例为代价引入这样的多标签示例，可以在自然语言推理任务和实体类型任务中获得明显的收益，即使我们只是首先使用单个标签数据进行训练，然后使用多标签示例进行微调。通过扩展混合数据增强框架，我们提出了一种学习算法，可以从不均匀的训练示例（具有零个、一个或多个标签）中学习。该算法有效地结合了来自不均匀训练数据的信号，并在较低的注释预算和跨域设置中带来额外的收益。总之，我们的方法在两个任务中获得了一致的精度和标签分布度量，这表明使用不均匀训练数据的训练对于许多NLP任务是有益的。摘要：Training NLP systems typically assumes access to annotated data that has a single human label per example. Given imperfect labeling from annotators and inherent ambiguity of language, we hypothesize that single label is not sufficient to learn the spectrum of language interpretation. We explore new label annotation distribution schemes, assigning multiple labels per example for a small subset of training examples. Introducing such multi label examples at the cost of annotating fewer examples brings clear gains on natural language inference task and entity typing task, even when we simply first train with a single label data and then fine tune with multi label examples. Extending a MixUp data augmentation framework, we propose a learning algorithm that can learn from uneven training examples (with zero, one, or multiple labels). This algorithm efficiently combines signals from uneven training data and brings additional gains in low annotation budget and cross domain settings. Together, our method achieves consistent gains in both accuracy and label distribution metrics in two tasks, suggesting training with uneven training data can be beneficial for many NLP tasks.

【3】 Learning Opinion Summarizers by Selecting Informative Reviews 标题：通过选择信息性评论学习意见摘要链接：https://arxiv.org/abs/2109.04325

作者：Arthur Bražinskas,Mirella Lapata,Ivan Titov 机构： ILCC, University of Edinburgh, ILLC, University of Amsterdam 备注：EMNLP 2021 摘要：传统的观点总结方法是采用无监督、弱监督和少量镜头学习技术。在这项工作中，我们收集了超过31000种产品的大量总结数据集，并与用户评论进行了配对，从而实现了监督训练。然而，每个产品的评论数量很大（平均320条），这使得总结——尤其是训练总结员——不切实际。此外，许多评论的内容并没有反映在人类编写的摘要中，因此，在随机评论子集上训练的摘要员会产生幻觉。为了应对这两个挑战，我们将任务描述为共同学习选择信息性评论子集并总结这些子集中表达的观点。审查子集的选择被视为一个潜在变量，由一个小而简单的选择器预测。然后将子集输入一个更强大的摘要器。对于联合训练，我们使用摊销变分推理和策略梯度方法。我们的实验证明了选择信息性评论的重要性，从而提高了总结的质量，减少了幻觉。摘要：Opinion summarization has been traditionally approached with unsupervised, weakly-supervised and few-shot learning techniques. In this work, we collect a large dataset of summaries paired with user reviews for over 31,000 products, enabling supervised training. However, the number of reviews per product is large (320 on average), making summarization - and especially training a summarizer - impractical. Moreover, the content of many reviews is not reflected in the human-written summaries, and, thus, the summarizer trained on random review subsets hallucinates. In order to deal with both of these challenges, we formulate the task as jointly learning to select informative subsets of reviews and summarizing the opinions expressed in these subsets. The choice of the review subset is treated as a latent variable, predicted by a small and simple selector. The subset is then fed into a more powerful summarizer. For joint training, we use amortized variational inference and policy gradient methods. Our experiments demonstrate the importance of selecting informative reviews resulting in improved quality of summaries and reduced hallucinations.

【4】 Cartography Active Learning 标题：地图学主动学习链接：https://arxiv.org/abs/2109.04282

作者：Mike Zhang,Barbara Plank 机构：Department of Computer Science, IT University of Copenhagen 备注：Findings EMNLP 2021 摘要：我们提出了地图主动学习（CAL），这是一种新的主动学习（AL）算法，它利用模型在训练过程中对单个实例的行为作为代理来查找信息量最大的实例进行标记。CAL的灵感来源于数据地图，最近提出的数据地图旨在深入了解数据集质量（Swayamdipta等人，2020年）。我们将我们在流行的文本分类任务上的方法与常用的AL策略进行比较，后者依赖于训练后的行为。我们证明了CAL与其他常用的AL方法相比具有竞争力，这表明从小种子数据派生的训练动态可以成功地用于AL。我们通过利用数据图分析批次级统计数据，对我们的新AL方法提供了见解。我们的研究结果进一步表明，CAL可以产生更有效的数据学习策略，可以用更少的训练数据获得可比或更好的结果。摘要：We propose Cartography Active Learning (CAL), a novel Active Learning (AL) algorithm that exploits the behavior of the model on individual instances during training as a proxy to find the most informative instances for labeling. CAL is inspired by data maps, which were recently proposed to derive insights into dataset quality (Swayamdipta et al., 2020). We compare our method on popular text classification tasks to commonly used AL strategies, which instead rely on post-training behavior. We demonstrate that CAL is competitive to other common AL methods, showing that training dynamics derived from small seed data can be successfully used for AL. We provide insights into our new AL method by analyzing batch-level statistics utilizing the data maps. Our results further show that CAL results in a more data-efficient learning strategy, achieving comparable or better results with considerably less training data.

【5】 Efficient Nearest Neighbor Language Models 标题：一种高效的最近邻语言模型链接：https://arxiv.org/abs/2109.04212

作者：Junxian He,Graham Neubig,Taylor Berg-Kirkpatrick 机构：†Language Technologies Institute, Carnegie Mellon University, ‡Department of Computer Science and Engineering, University of California San Diego 备注：EMNLP 2021 摘要：非参数神经语言模型（NLM）利用外部数据存储学习文本的预测分布，这允许它们通过显式存储训练数据点进行学习。虽然有效，但这些模型通常需要在测试时从大型数据存储中检索，这大大增加了推理开销，从而限制了非参数NLM在实际应用中的部署。在本文中，我们以最近提出的$k$-最近邻语言模型（Khandelwal et al.，2019）为例，探索在各个维度上提高其效率的方法。在标准WikiText-103基准测试和域适配数据集上的实验表明，我们的方法能够在保持可比性能的同时，将推理速度提高6倍。我们提出的实证分析可能为未来寻求开发或部署更有效的非参数NLM的研究提供指导。摘要：Non-parametric neural language models (NLMs) learn predictive distributions of text utilizing an external datastore, which allows them to learn through explicitly memorizing the training datapoints. While effective, these models often require retrieval from a large datastore at test time, significantly increasing the inference overhead and thus limiting the deployment of non-parametric NLMs in practical applications. In this paper, we take the recently proposed $k$-nearest neighbors language model (Khandelwal et al., 2019) as an example, exploring methods to improve its efficiency along various dimensions. Experiments on the standard WikiText-103 benchmark and domain-adaptation datasets show that our methods are able to achieve up to a 6x speed-up in inference speed while retaining comparable performance. The empirical analysis we present may provide guidelines for future research seeking to develop or deploy more efficient non-parametric NLMs.

【6】 Fixing exposure bias with imitation learning needs powerful oracles 标题：用模仿学习修复暴露偏见需要强大的先知链接：https://arxiv.org/abs/2109.04114

作者：Luca Hormann,Artem Sokolov 机构：Heidelberg University∗, Google Research• 摘要：我们使用模仿学习（IL）来解决NMT暴露偏差问题，并评估了基于SMT晶格的oracle，尽管它在无约束的oracle翻译任务中表现出色，但结果证明它过于精简和特殊，无法作为IL的oracle。摘要：We apply imitation learning (IL) to tackle the NMT exposure bias problem with error-correcting oracles, and evaluate an SMT lattice-based oracle which, despite its excellent performance in an unconstrained oracle translation task, turned out to be too pruned and idiosyncratic to serve as the oracle for IL.

【7】 Table-based Fact Verification with Salience-aware Learning 标题：基于显著感知学习的基于表格的事实验证链接：https://arxiv.org/abs/2109.04053

作者：Fei Wang,Kexuan Sun,Jay Pujara,Pedro Szekely,Muhao Chen 机构：Department of Computer Science & Information Sciences Institute, University of Southern California 备注：EMNLP 2021 (Findings) 摘要：表提供了可用于验证文本语句的有价值的知识。虽然许多工作都考虑了基于表的事实验证，但在文本语句中，表格数据与标记的直接对齐很少可用。此外，训练广义事实验证模型需要大量的标记训练数据。在本文中，我们提出了一个新的系统来解决这些问题。受反事实因果关系的启发，我们的系统使用基于探测的显著性估计来识别语句中的标记水平显著性。显著性估计允许从两个角度增强事实验证的学习。从一个角度来看，我们的系统进行掩蔽显著性标记预测，以增强表和语句之间的对齐和推理模型。从另一个角度来看，我们的系统通过替换非显著项，应用显著性感知数据增强来生成更加多样化的训练实例集。在TabFact上的实验结果表明，所提出的显著性感知学习技术有效地提高了SOTA的性能。我们的代码在https://github.com/luka-group/Salience-aware-Learning . 摘要：Tables provide valuable knowledge that can be used to verify textual statements. While a number of works have considered table-based fact verification, direct alignments of tabular data with tokens in textual statements are rarely available. Moreover, training a generalized fact verification model requires abundant labeled training data. In this paper, we propose a novel system to address these problems. Inspired by counterfactual causality, our system identifies token-level salience in the statement with probing-based salience estimation. Salience estimation allows enhanced learning of fact verification from two perspectives. From one perspective, our system conducts masked salient token prediction to enhance the model for alignment and reasoning between the table and the statement. From the other perspective, our system applies salience-aware data augmentation to generate a more diverse set of training instances by replacing non-salient terms. Experimental results on TabFact show the effective improvement by the proposed salience-aware learning techniques, leading to the new SOTA performance on the benchmark. Our code is publicly available at https://github.com/luka-group/Salience-aware-Learning .

其他(7篇)

【1】 Tracking Turbulence Through Financial News During COVID-19 标题：通过财经新闻追踪冠状病毒期间的湍流链接：https://arxiv.org/abs/2109.04369

作者：Philip Hossu,Natalie Parde 机构：University of Illinois at Chicago, Department of Computer Science 摘要：尽管造成了严重的人员伤亡，但新冠病毒-19大流行在金融市场创造了独特的不稳定条件。在这项工作中，我们揭示并讨论了在2020年大流行引发的美国金融危机期间，金融出版物中涉及情绪的关系。首先，我们为来自美国主要金融新闻出版商的文章介绍一组关于金融情绪的专家注释。在探索性的数据分析之后，我们描述了一个基于CNN的架构，以解决在这种异常、动荡的环境中预测金融情绪的任务。我们表现最佳的车型的F1加权最高分数为0.746，建立了强大的性能基准。利用我们的最佳表现模型的预测，我们对真实股市数据进行了统计相关性研究，发现金融新闻与标准普尔500指数、交易量、市场波动性和不同的单因素ETF之间存在有趣而强烈的关系。摘要：Grave human toll notwithstanding, the COVID-19 pandemic created uniquely unstable conditions in financial markets. In this work we uncover and discuss relationships involving sentiment in financial publications during the 2020 pandemic-motivated U.S. financial crash. First, we introduce a set of expert annotations of financial sentiment for articles from major American financial news publishers. After an exploratory data analysis, we then describe a CNN-based architecture to address the task of predicting financial sentiment in this anomalous, tumultuous setting. Our best performing model achieves a maximum weighted F1 score of 0.746, establishing a strong performance benchmark. Using predictions from our top performing model, we close by conducting a statistical correlation study with real stock market data, finding interesting and strong relationships between financial news and the S\&P 500 index, trading volume, market volatility, and different single-factor ETFs.

【2】 MetaXT: Meta Cross-Task Transfer between Disparate Label Spaces 标题：MetaXT：不同标签空间之间的元跨任务传输链接：https://arxiv.org/abs/2109.04240

作者：Srinagesh Sharma,Guoqing Zheng,Ahmed Hassan Awadallah 机构： Microsoft Way, Redmond, WA, Microsoft Research, Ahmed H. Awadallah 摘要：尽管预先训练的语言模型具有普遍的表征能力，但将它们应用于特定的NLP任务仍然需要大量的标记数据。有效的任务微调在任务中只有少数标记的示例时会遇到挑战。在这篇文章中，我们的目标是通过开发和转移一个不同的任务来解决Few-Shot任务学习的问题，该任务允许一个相关但不同的标签空间。具体来说，我们设计了一个标签传输网络（LTN），将标签从源任务转换为目标任务进行训练。LTN和任务预测模型都是通过一个双层优化框架学习的，我们称之为MetaXT。MetaXT提供了一个原则性的解决方案，通过从源任务转移知识，使预先训练的语言模型最好地适应目标任务。从两种不同类型的标签空间差异对四个NLP任务的跨任务转移设置进行实证评估，证明了MetaXT的有效性，尤其是当目标任务中的标签数据有限时。摘要：Albeit the universal representational power of pre-trained language models, adapting them onto a specific NLP task still requires a considerably large amount of labeled data. Effective task fine-tuning meets challenges when only a few labeled examples are present for the task. In this paper, we aim to the address of the problem of few shot task learning by exploiting and transferring from a different task which admits a related but disparate label space. Specifically, we devise a label transfer network (LTN) to transform the labels from source task to the target task of interest for training. Both the LTN and the model for task prediction are learned via a bi-level optimization framework, which we term as MetaXT. MetaXT offers a principled solution to best adapt a pre-trained language model to the target task by transferring knowledge from the source task. Empirical evaluations on cross-task transfer settings for four NLP tasks, from two different types of label space disparities, demonstrate the effectiveness of MetaXT, especially when the labeled data in the target task is limited.

【3】 Fusing task-oriented and open-domain dialogues in conversational agents 标题：在会话代理中融合面向任务和开放领域的对话链接：https://arxiv.org/abs/2109.04137

作者：Tom Young,Frank Xing,Vlad Pandelea,Jinjie Ni,Erik Cambria 机构： School of computer science and engineering, Nanyang Technological University, School of computing, National University of Singapore 摘要：构建智能对话系统的目标在很大程度上是在两种范式下分别实现的：任务导向对话（TOD）系统，执行目标导向的功能，以及开放域对话（ODD）系统，专注于非目标导向的聊天。这两种对话模式可以在同一个对话中无缝地交织在一起，友好的人类助手很容易做到这一点。这种能力在会话代理中是可取的，因为集成使它们更容易访问和使用。本文讨论了在多回合对话中融合TOD和赔率的问题。基于流行的TOD数据集MultiWOZ，我们通过重写现有的TOD回合并添加新的奇数回合，构建了一个新的数据集FusedChat。此过程构造对话会话，其中包含来自两种对话模式的交流。它具有模式间的语境依赖性，即两种模式之间的对话相互依赖。丰富的依赖模式，包括共同引用和省略号都是特征。新的数据集包含6万个新的人类书写奇数回合和5万个重新书写的TOD回合，为测试对话模型执行跨模式对话的能力提供了一个基准。这是一项更具挑战性的任务，因为模型必须确定适当的对话模式，并根据模式间上下文生成响应。但这样的模型更能模拟人的对话能力。我们在此任务中评估基线模型，包括两阶段模型和融合模型。我们公开发布FusedChat和基线，以推动跨模式对话系统的未来工作https://github.com/tomyoung903/FusedChat. 摘要：The goal of building intelligent dialogue systems has largely been \textit{separately} pursued under two paradigms: task-oriented dialogue (TOD) systems, which perform goal-oriented functions, and open-domain dialogue (ODD) systems, which focus on non-goal-oriented chitchat. The two dialogue modes can potentially be intertwined together seamlessly in the same conversation, as easily done by a friendly human assistant. Such ability is desirable in conversational agents, as the integration makes them more accessible and useful. Our paper addresses this problem of fusing TODs and ODDs in multi-turn dialogues. Based on the popular TOD dataset MultiWOZ, we build a new dataset FusedChat, by rewriting the existing TOD turns and adding new ODD turns. This procedure constructs conversation sessions containing exchanges from both dialogue modes. It features inter-mode contextual dependency, i.e., the dialogue turns from the two modes depend on each other. Rich dependency patterns including co-reference and ellipsis are features. The new dataset, with 60k new human-written ODD turns and 5k re-written TOD turns, offers a benchmark to test a dialogue model's ability to perform inter-mode conversations. This is a more challenging task since the model has to determine the appropriate dialogue mode and generate the response based on the inter-mode context. But such models would better mimic human-level conversation capabilities. We evaluate baseline models on this task, including \textit{classification-based} two-stage models and \textit{two-in-one} fused models. We publicly release FusedChat and the baselines to propel future work on inter-mode dialogue systems https://github.com/tomyoung903/FusedChat.

【4】 Enhanced Speaker-aware Multi-party Multi-turn Dialogue Comprehension 标题：增强型说话人感知多方多轮对话理解链接：https://arxiv.org/abs/2109.04066

作者：Xinbei Ma,Zhuosheng Zhang,Hai Zhao 机构：Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction, and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 摘要：多党多话轮对话理解在处理来自多个说话人的复杂情景以及说话人意识话语之间的纵横交错的话语关系方面带来了前所未有的挑战。现有的大多数方法将对话语境作为普通文本处理，对关键的说话人感知线索关注不够。在这项工作中，我们提出了一个具有掩蔽注意和异构图网络的说话人感知增强模型，以从说话人属性和说话人感知关系两个方面全面捕获话语线索。通过这种全面的说话人感知模型，实验结果表明，我们的说话人感知模型有助于在基准数据集Molweni上实现最先进的性能。案例分析表明，我们的模型增强了话语与自己的说话人之间的联系，捕捉到了说话人感知的话语关系，这对对话建模至关重要。摘要：Multi-party multi-turn dialogue comprehension brings unprecedented challenges on handling the complicated scenarios from multiple speakers and criss-crossed discourse relationship among speaker-aware utterances. Most existing methods deal with dialogue contexts as plain texts and pay insufficient attention to the crucial speaker-aware clues. In this work, we propose an enhanced speaker-aware model with masking attention and heterogeneous graph networks to comprehensively capture discourse clues from both sides of speaker property and speaker-aware relationships. With such comprehensive speaker-aware modeling, experimental results show that our speaker-aware model helps achieves state-of-the-art performance on the benchmark dataset Molweni. Case analysis shows that our model enhances the connections between utterances and their own speakers and captures the speaker-aware discourse relations, which are critical for dialogue modeling.

【5】 A Formal Description of Sorani Kurdish Morphology 标题：索拉尼库尔德人形态的形式化描述链接：https://arxiv.org/abs/2109.03942

作者：Sina Ahmadi 机构：Insight Centre for Data Analytics, National University of Ireland Galway, Contents 备注：It should be noted that the current manuscript is being completed. Any suggestions or reporting of errors will be highly appreciated. 36 Pages 摘要：索拉尼库尔德语，也被称为中央库尔德语，有一个复杂的形态，特别是由于语素出现的模式。尽管库尔德语形态学的几个方面已经被研究过，例如代词内隐策略和Izafa结构，但索拉尼库尔德语形态学在计算语言学中却很少受到关注。此外，一些语素，如强调内切点=^i\c{s}和派生语素，以前没有被研究过。为了解决索拉尼的复杂形态学问题，我们以一种正式的方式对索拉尼-库尔德语的形态学和形态音韵学结构进行了全面的描述，以便它们可以用作形态学分析和合成的有限状态传感器。摘要：Sorani Kurdish, also known as Central Kurdish, has a complex morphology, particularly due to the patterns in which morphemes appear. Although several aspects of Kurdish morphology have been studied, such as pronominal endoclitics and Izafa constructions, Sorani Kurdish morphology has received trivial attention in computational linguistics. Moreover, some morphemes, such as the emphasis endoclitic =\^i\c{s}, and derivational morphemes have not been previously studied. To tackle the complex morphology of Sorani, we provide a thorough description of Sorani Kurdish morphological and morphophonological constructions in a formal way such that they can be used as finite-state transducers for morphological analysis and synthesis.

【6】 ELIT: Emory Language and Information Toolkit 标题：Elit：埃默里语言和信息工具包链接：https://arxiv.org/abs/2109.03903

作者：Han He,Liyan Xu,Jinho D. Choi 机构：Computer Science, Emory University, Atlanta GA , USA 摘要：我们介绍ELIT，Emory语言和信息工具包，它是一个全面的NLP框架，为核心任务提供基于转换器的端到端模型，特别关注内存效率，同时保持最先进的准确性和速度。与现有工具包相比，ELIT具有一个高效的多任务学习（MTL）模型，具有许多下游任务，包括柠檬化、词性标记、命名实体识别、依赖项分析、选区分析、语义角色标记和AMR分析。ELIT的MTL框架的主干是一个经过预训练的transformer编码器，它可以在任务之间共享，以加快推理速度。ELIT提供了在八个数据集的混合上开发的预先训练的模型。为了扩展其服务，ELIT还集成了RESTful客户机/服务器组合。在服务器端，ELIT将其功能扩展到其他任务，如标记化和共同引用解析，为最终用户提供敏捷的研究体验。所有资源，包括源代码、文档和预先训练的模型，均可在https://github.com/emorynlp/elit. 摘要：We introduce ELIT, the Emory Language and Information Toolkit, which is a comprehensive NLP framework providing transformer-based end-to-end models for core tasks with a special focus on memory efficiency while maintaining state-of-the-art accuracy and speed. Compared to existing toolkits, ELIT features an efficient Multi-Task Learning (MTL) model with many downstream tasks that include lemmatization, part-of-speech tagging, named entity recognition, dependency parsing, constituency parsing, semantic role labeling, and AMR parsing. The backbone of ELIT's MTL framework is a pre-trained transformer encoder that is shared across tasks to speed up their inference. ELIT provides pre-trained models developed on a remix of eight datasets. To scale up its service, ELIT also integrates a RESTful Client/Server combination. On the server side, ELIT extends its functionality to cover other tasks such as tokenization and coreference resolution, providing an end user with agile research experience. All resources including the source codes, documentation, and pre-trained models are publicly available at https://github.com/emorynlp/elit.

【7】 A Bayesian Framework for Information-Theoretic Probing 标题：信息论探索的贝叶斯框架链接：https://arxiv.org/abs/2109.03853

作者：Tiago Pimentel,Ryan Cotterell 机构：University of Cambridge, ETH Zürich 备注：Accepted for publication in EMNLP 2021. Code available in this https URL 摘要：Pimentel等人（2020年）最近从信息理论的角度分析了探测。他们认为，探测应该被视为接近于一种相互信息。这导致了一个相当不直观的结论，即表征编码的目标任务信息与原始句子完全相同。然而，互信息假设一对随机变量的真实概率分布是已知的，在不知道的情况下会导致不直观的结果。本文提出了一个新的框架来衡量我们所称的贝叶斯互信息，该框架从贝叶斯代理的角度分析信息——允许在数据有限的情况下获得更直观的发现。例如，在贝叶斯MI下，数据可以添加信息，处理可以帮助，信息可以伤害，这使得机器学习应用程序更加直观。最后，我们将我们的框架应用于探索，我们认为贝叶斯互信息通过明确限制可用的背景知识来解决任务，从而自然地使提取变得容易。摘要：Pimentel et al. (2020) recently analysed probing from an information-theoretic perspective. They argue that probing should be seen as approximating a mutual information. This led to the rather unintuitive conclusion that representations encode exactly the same information about a target task as the original sentences. The mutual information, however, assumes the true probability distribution of a pair of random variables is known, leading to unintuitive results in settings where it is not. This paper proposes a new framework to measure what we term Bayesian mutual information, which analyses information from the perspective of Bayesian agents -- allowing for more intuitive findings in scenarios with finite data. For instance, under Bayesian MI we have that data can add information, processing can help, and information can hurt, which makes it more intuitive for machine learning applications. Finally, we apply our framework to probing where we believe Bayesian mutual information naturally operationalises ease of extraction by explicitly limiting the available background knowledge to solve a task.

机器翻译，仅供参考

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-09-10，如有侵权请联系 cloudcommunity@tencent.com 删除

linux