自然语言处理学术速递[9.9]

公众号-arXiv每日学术速递

发布于 2021-09-16 16:53:50

8180

发布于 2021-09-16 16:53:50

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！点击阅读原文访问arxivdaily.com，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.CL 方向，今日共计39篇

BERT(1篇)

【1】 NSP-BERT: A Prompt-based Zero-Shot Learner Through an Original Pre-training Task--Next Sentence Prediction 标题：NSP-BERT：一种通过原始预训练任务--下一句预测--的基于提示的零命中率学习者链接：https://arxiv.org/abs/2109.03564

作者：Yi Sun,Yu Zheng,Chao Hao,Hangping Qiu 机构：Army Engineering University of PLA 备注：11 pages, 9 figures 摘要：使用提示来利用语言模型执行各种下游任务，也称为基于提示的学习或提示学习，与训练前和微调范式相比，最近取得了显著的成功。尽管如此，几乎所有基于提示的方法都是标记级的，这意味着它们都使用GPT的从左到右语言模型或BERT的蒙面语言模型来执行完形填空风格的任务。在本文中，我们尝试使用RoBERTa和其他模型放弃的BERT原始预训练任务——下一句预测（NSP），在Zero-Shot场景中完成几个NLP任务。与令牌级技术不同，我们的基于句子级提示的方法NSP-BERT不需要固定提示的长度或要预测的位置，允许它轻松处理实体链接等任务。基于NSP-BERT的特点，我们为各种下游任务提供了几种快速构建模板。我们特别提出了一种两阶段提示的词义消歧方法。我们映射标签的策略显著提高了模型在句子对任务上的性能。在FewCLUE基准上，我们的NSP-BERT在大多数任务上都优于其他零炮方法，并且接近少数几种炮方法。摘要：Using prompts to utilize language models to perform various downstream tasks, also known as prompt-based learning or prompt-learning, has lately gained significant success in comparison to the pre-train and fine-tune paradigm. Nonetheless, virtually all prompt-based methods are token-level, meaning they all utilize GPT's left-to-right language model or BERT's masked language model to perform cloze-style tasks. In this paper, we attempt to accomplish several NLP tasks in the zero-shot scenario using a BERT original pre-training task abandoned by RoBERTa and other models--Next Sentence Prediction (NSP). Unlike token-level techniques, our sentence-level prompt-based method NSP-BERT does not need to fix the length of the prompt or the position to be predicted, allowing it to handle tasks such as entity linking with ease. Based on the characteristics of NSP-BERT, we offer several quick building templates for various downstream tasks. We suggest a two-stage prompt method for word sense disambiguation tasks in particular. Our strategies for mapping the labels significantly enhance the model's performance on sentence pair tasks. On the FewCLUE benchmark, our NSP-BERT outperforms other zero-shot methods on most of these tasks and comes close to the few-shot methods.

QA|VQA|问答|对话(6篇)

【1】 Cross-Policy Compliance Detection via Question Answering 标题：通过问答进行跨策略合规性检测链接：https://arxiv.org/abs/2109.03731

作者：Marzieh Saeidi,Majid Yazdani,Andreas Vlachos 机构：Facebook AI, University of Cambridge 备注：None 摘要：策略合规性检测的任务是确保场景符合策略（例如，根据政府规则，声明有效，或在线平台中的帖子符合社区指南）。此任务以前被实例化为文本蕴涵的一种形式，由于策略的复杂性，导致准确性较差。在本文中，我们建议通过将策略符合性检测分解为问题回答来解决该问题，其中问题检查策略中声明的条件是否适用于场景，并且表达式树将答案组合起来以获得标签。尽管最初的前期注释成本很高，但我们证明了这种方法具有更好的准确性，特别是在跨策略设置中，测试期间的策略在训练中是看不见的。此外，它允许我们使用在现有大型数据集上预先训练的现有问答模型。最后，如果无法确定策略遵从性，它将明确标识场景中缺少的信息。我们使用由政府政策组成的最新数据集进行实验，并使用专家注释对其进行补充，发现注释问答分解的成本在很大程度上被改进的注释者间协议和速度所抵消。摘要：Policy compliance detection is the task of ensuring that a scenario conforms to a policy (e.g. a claim is valid according to government rules or a post in an online platform conforms to community guidelines). This task has been previously instantiated as a form of textual entailment, which results in poor accuracy due to the complexity of the policies. In this paper we propose to address policy compliance detection via decomposing it into question answering, where questions check whether the conditions stated in the policy apply to the scenario, and an expression tree combines the answers to obtain the label. Despite the initial upfront annotation cost, we demonstrate that this approach results in better accuracy, especially in the cross-policy setup where the policies during testing are unseen in training. In addition, it allows us to use existing question answering models pre-trained on existing large datasets. Finally, it explicitly identifies the information missing from a scenario in case policy compliance cannot be determined. We conduct our experiments using a recent dataset consisting of government policies, which we augment with expert annotations and find that the cost of annotating question answering decomposition is largely offset by improved inter-annotator agreement and speed.

【2】 Formal Query Building with Query Structure Prediction for Complex Question Answering over Knowledge Base 标题：知识库复杂问答中带查询结构预测的形式化查询构建链接：https://arxiv.org/abs/2109.03614

作者：Yongrui Chen,Huiying Li,Yuncheng Hua,Guilin Qi 机构：School of Computer Science and Engineering, Southeast University, Nanjing, China 备注：Accepted to IJCAI 2020 摘要：形式化查询构建是基于知识库的复杂问答的重要组成部分。它旨在为问题构建正确的可执行查询。最近的方法尝试对由状态转换策略生成的候选查询进行排序。然而，这种候选生成策略忽略了查询的结构，导致了大量的噪声查询。在本文中，我们提出了一种新的形式化查询构建方法，它包括两个阶段。在第一阶段，我们预测问题的查询结构，并利用该结构约束候选查询的生成。我们提出了一种新的图形生成框架来处理结构预测任务，并设计了一个编码器-解码器模型来预测每个生成步骤中预定操作的参数。在第二阶段，我们按照前面的方法对候选查询进行排序。实验结果表明，我们的形式化查询构建方法在复杂问题上的性能优于现有方法，而在简单问题上保持竞争力。摘要：Formal query building is an important part of complex question answering over knowledge bases. It aims to build correct executable queries for questions. Recent methods try to rank candidate queries generated by a state-transition strategy. However, this candidate generation strategy ignores the structure of queries, resulting in a considerable number of noisy queries. In this paper, we propose a new formal query building approach that consists of two stages. In the first stage, we predict the query structure of the question and leverage the structure to constrain the generation of the candidate queries. We propose a novel graph generation framework to handle the structure prediction task and design an encoder-decoder model to predict the argument of the predetermined operation in each generative step. In the second stage, we follow the previous methods to rank the candidate queries. The experimental results show that our formal query building approach outperforms existing methods on complex questions while staying competitive on simple questions.

【3】 R2-D2: A Modular Baseline for Open-Domain Question Answering 标题：R2-D2：开放领域问答的模块化基线链接：https://arxiv.org/abs/2109.03502

作者：Martin Fajcik,Martin Docekal,Karel Ondrej,Pavel Smrz 机构：Brno University of Technology 备注：Accepted to Findings of EMNLP'21. arXiv admin note: substantial text overlap with arXiv:2102.10697 摘要：这项工作提出了一种新的四阶段开放域QA管道R2-D2（排名两次，读取两次）。该管道由检索器、通道重排序器、抽取式读卡器、生成式读卡器和一个从所有系统组件聚合最终预测的机制组成。我们通过三个开放域QA数据集展示了它的实力：NaturalQuestions、TriviaQA和EfficientQA，在前两个方面超过了最先进的水平。我们的分析表明：（i）将抽取式和生成式读取器相结合可产生高达5个精确匹配的绝对改进，其效果至少是具有不同参数的相同模型的后验平均集成的两倍，（ii）参数较少的抽取式读卡器可以与生成式读卡器在抽取式QA数据集上的性能相匹配。摘要：This work presents a novel four-stage open-domain QA pipeline R2-D2 (Rank twice, reaD twice). The pipeline is composed of a retriever, passage reranker, extractive reader, generative reader and a mechanism that aggregates the final prediction from all system's components. We demonstrate its strength across three open-domain QA datasets: NaturalQuestions, TriviaQA and EfficientQA, surpassing state-of-the-art on the first two. Our analysis demonstrates that: (i) combining extractive and generative reader yields absolute improvements up to 5 exact match and it is at least twice as effective as the posterior averaging ensemble of the same models with different parameters, (ii) the extractive reader with fewer parameters can match the performance of the generative reader on extractive QA datasets.

【4】 ArchivalQA: A Large-scale Benchmark Dataset for Open Domain Question Answering over Archival News Collections 标题：ArchivalQA：档案新闻集中开放领域问答的大规模基准数据集链接：https://arxiv.org/abs/2109.03438

作者：Jiexin Wang,Adam Jatowt,Masatoshi Yoshikawa 机构：Kyoto University, Japan, University of Innsbruck, Austria 摘要：在过去几年中，由于深度学习技术的发展和大规模问答数据集的可用性，开放领域问答（ODQA）得到了迅速发展。然而，当前的数据集基本上是为同步文档收集而设计的（例如，维基百科）。时态新闻收藏，如跨越几十年的长期新闻档案，很少用于训练模型，尽管它们对我们的社会非常有价值。为了促进ODQA领域对此类历史收藏的研究，我们介绍了ArchivalQA，这是一个大型问答数据集，由1067056对问答组成，专为时态新闻问答而设计。此外，我们根据问题难度和时态表达式的包含情况创建了数据集的四个子部分，我们认为这对于训练或测试具有不同优势和能力的ODQA系统是有用的。我们介绍的新的QA数据集构建框架也可以应用于创建其他类型集合上的数据集。摘要：In the last few years, open-domain question answering (ODQA) has advanced rapidly due to the development of deep learning techniques and the availability of large-scale QA datasets. However, the current datasets are essentially designed for synchronic document collections (e.g., Wikipedia). Temporal news collections such as long-term news archives spanning several decades, are rarely used in training the models despite they are quite valuable for our society. In order to foster the research in the field of ODQA on such historical collections, we present ArchivalQA, a large question answering dataset consisting of 1,067,056 question-answer pairs which is designed for temporal news QA. In addition, we create four subparts of our dataset based on the question difficulty levels and the containment of temporal expressions, which we believe could be useful for training or testing ODQA systems characterized by different strengths and abilities. The novel QA dataset-constructing framework that we introduce can be also applied to create datasets over other types of collections.

【5】 It is AI's Turn to Ask Human a Question: Question and Answer Pair Generation for Children Storybooks in FairytaleQA Dataset 标题：轮到人工智能向人类提问了：FairytaleQA数据集中儿童故事书的问答对生成链接：https://arxiv.org/abs/2109.03423

作者：Bingsheng Yao,Dakuo Wang,Tongshuang Wu,Tran Hoang,Branda Sun,Toby Jia-Jun Li,Mo Yu,Ying Xu 机构：Rensselaer Polytechnic Institute, IBM Research, University of Washington, University of California Irvine, University of Notre Dame 摘要：现有问答（QA）数据集的创建主要是为了让AI能够回答人类提出的问题。但在教育应用中，教师和家长有时可能不知道应该问孩子什么问题才能最大限度地提高他们的语言学习效果。教育专家在46本童话故事书中为幼儿读者标记了一个新发布的图书QA数据集（FairytaleQA），我们为这个新应用开发了一个自动化QA生成模型体系结构。我们的模型（1）通过基于教学框架的精心设计的启发式，从给定的故事书段落中提取候选答案；（2）使用语言模型生成对应于每个提取答案的适当问题；并且，（3）使用另一个QA模型对顶级QA对进行排名。自动和人工评估表明，我们的模型优于基线。我们还证明，我们的方法可以通过对200本未标记的故事书进行数据扩充，帮助解决儿童图书QA数据集的稀缺性问题。摘要：Existing question answering (QA) datasets are created mainly for the application of having AI to be able to answer questions asked by humans. But in educational applications, teachers and parents sometimes may not know what questions they should ask a child that can maximize their language learning results. With a newly released book QA dataset (FairytaleQA), which educational experts labeled on 46 fairytale storybooks for early childhood readers, we developed an automated QA generation model architecture for this novel application. Our model (1) extracts candidate answers from a given storybook passage through carefully designed heuristics based on a pedagogical framework; (2) generates appropriate questions corresponding to each extracted answer using a language model; and, (3) uses another QA model to rank top QA-pairs. Automatic and human evaluations show that our model outperforms baselines. We also demonstrate that our method can help with the scarcity issue of the children's book QA dataset via data augmentation on 200 unlabeled storybooks.

【6】 Self-supervised Contrastive Cross-Modality Representation Learning for Spoken Question Answering 标题：口语答疑中的自监督对比跨模态表征学习链接：https://arxiv.org/abs/2109.03381

作者：Chenyu You,Nuo Chen,Yuexian Zou 机构：† Department of Electrical Engineering, Yale University, New Haven, CT, USA, ‡ADSPLAB, School of ECE, Peking University, Shenzhen, China, §Peng Cheng Laboratory, Shenzhen, China 摘要：口语问答（SQA）需要对口语文档和问题进行细粒度的理解，以实现最佳答案预测。在本文中，我们提出了一种新的口语问答训练方案，包括自我监督训练阶段和对比表征学习阶段。在自监督阶段，我们提出了三个辅助的自监督任务，包括话语恢复、话语插入和问题识别，并联合训练模型在不需要任何附加数据或注释的情况下捕获语音文档之间的一致性和连贯性。然后，我们提出在对比目标中通过多种增广策略学习噪声不变的话语表征，包括广度删除和广度替换。此外，我们还设计了一种时间对齐方法，注意在学习到的公共空间中对语音文本线索进行语义对齐，从而有利于SQA任务的完成。通过这种方式，训练方案可以更有效地指导生成模型预测更合适的答案。实验结果表明，我们的模型在三个SQA基准上达到了最先进的结果。摘要：Spoken question answering (SQA) requires fine-grained understanding of both spoken documents and questions for the optimal answer prediction. In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage. In the self-supervised stage, we propose three auxiliary self-supervised tasks, including utterance restoration, utterance insertion, and question discrimination, and jointly train the model to capture consistency and coherence among speech documents without any additional data or annotations. We then propose to learn noise-invariant utterance representations in a contrastive objective by adopting multiple augmentation strategies, including span deletion and span substitution. Besides, we design a Temporal-Alignment attention to semantically align the speech-text clues in the learned common space and benefit the SQA tasks. By this means, the training schemes can more effectively guide the generation model to predict more proper answers. Experimental results show that our model achieves state-of-the-art results on three SQA benchmarks.

机器翻译(3篇)

【1】 Rethinking Data Augmentation for Low-Resource Neural Machine Translation: A Multi-Task Learning Approach 标题：低资源神经机器翻译中数据增强的再思考：一种多任务学习方法链接：https://arxiv.org/abs/2109.03645

作者：Víctor M. Sánchez-Cartagena,Miquel Esplà-Gomis,Juan Antonio Pérez-Ortiz,Felipe Sánchez-Martínez 机构：Dep. de Llenguatges i Sistemes Informàtics, Universitat d’Alacant, E-, Sant Vicent del Raspeig (Spain) 备注：To be published as long paper in EMNLP 2021 摘要：在神经机器翻译的上下文中，当可用的并行数据稀少时，数据增强（DA）技术可用于生成额外的训练样本。许多DA方法旨在通过生成包含不常见单词的新句子对来扩大对经验数据分布的支持，从而使其更接近平行句子的真实数据分布。在本文中，我们建议采用一种完全不同的方法，并提出了一种多任务DA方法，在该方法中，我们通过转换生成新的句子对，例如颠倒目标句子的顺序，从而生成不连贯的目标句子。在训练过程中，这些增广句被用作多任务框架中的辅助任务，目的是在目标前缀信息量不足以预测下一个单词的情况下提供新的上下文。这加强了编码器，并迫使解码器更加关注编码器的源表示。在六个低资源翻译任务上进行的实验表明，与基线方法和DA方法相比，一致的改进旨在扩展经验数据分布的支持。使用我们的方法训练的系统更依赖于源令牌，对域转移更具鲁棒性，遭受的幻觉更少。摘要：In the context of neural machine translation, data augmentation (DA) techniques may be used for generating additional training samples when the available parallel data are scarce. Many DA approaches aim at expanding the support of the empirical data distribution by generating new sentence pairs that contain infrequent words, thus making it closer to the true data distribution of parallel sentences. In this paper, we propose to follow a completely different approach and present a multi-task DA approach in which we generate new sentence pairs with transformations, such as reversing the order of the target sentence, which produce unfluent target sentences. During training, these augmented sentences are used as auxiliary tasks in a multi-task framework with the aim of providing new contexts where the target prefix is not informative enough to predict the next word. This strengthens the encoder and forces the decoder to pay more attention to the source representations of the encoder. Experiments carried out on six low-resource translation tasks show consistent improvements over the baseline and over DA methods aiming at extending the support of the empirical data distribution. The systems trained with our approach rely more on the source tokens, are more robust against domain shift and suffer less hallucinations.

【2】 Vision Matters When It Should: Sanity Checking Multimodal Machine Translation Models 标题：视觉在它应该发挥作用的时候很重要：理智检查多模态机器翻译模型链接：https://arxiv.org/abs/2109.03415

作者：Jiaoda Li,Duygu Ataman,Rico Sennrich 机构：ETH Zürich, New York University, University of Zürich, University of Edinburgh 备注：EMNLP 2021 摘要：多模态机器翻译（MMT）系统已被证明超越了他们的文本只有神经机器翻译（NMT）对应时，视觉上下文是可用的。然而，最近的研究也表明，当相关图像被不相关的图像或噪声替换时，MMT模型的性能仅受到轻微影响，这表明模型可能根本不会利用视觉环境。我们假设这可能是由于常用的评估基准（也称为Multi30K）的性质造成的，在该基准中，图像字幕的翻译是在没有向人类翻译人员实际展示图像的情况下准备的。在本文中，我们提出了一项定性研究，考察了数据集在刺激视觉模态利用方面的作用，并提出了强调数据集中视觉信号重要性的方法，这些方法证明了模型对源图像依赖性的改进。我们的研究结果表明，有效的MMT体系结构的研究目前因缺乏合适的数据集而受到损害，在创建未来的MMT数据集时必须仔细考虑，我们也为此提供了有用的见解。摘要：Multimodal machine translation (MMT) systems have been shown to outperform their text-only neural machine translation (NMT) counterparts when visual context is available. However, recent studies have also shown that the performance of MMT models is only marginally impacted when the associated image is replaced with an unrelated image or noise, which suggests that the visual context might not be exploited by the model at all. We hypothesize that this might be caused by the nature of the commonly used evaluation benchmark, also known as Multi30K, where the translations of image captions were prepared without actually showing the images to human translators. In this paper, we present a qualitative study that examines the role of datasets in stimulating the leverage of visual modality and we propose methods to highlight the importance of visual signals in the datasets which demonstrate improvements in reliance of models on the source images. Our findings suggest the research on effective MMT architectures is currently impaired by the lack of suitable datasets and careful consideration must be taken in creation of future MMT datasets, for which we also provide useful insights.

【3】 Mixup Decoding for Diverse Machine Translation 标题：面向多种机器翻译的混合解码链接：https://arxiv.org/abs/2109.03402

作者：Jicheng Li,Pengzhi Gao,Xuanfu Wu,Yang Feng,Zhongjun He,Hua Wu,Haifeng Wang 机构： Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences (ICTCAS), University of Chinese Academy of Sciences, Baidu Inc. No. , Shangdi ,th Street, Beijing, China 备注：Findings of EMNLP 2021 摘要：多样性机器翻译旨在为给定的源语言句子生成不同的目标语言翻译。利用混合训练引入的句子潜在空间中的线性关系，我们提出了一种新的方法，MixDiversity，在解码时使用从训练语料库中抽取的不同句子对对对输入句子进行线性插值，从而为输入句子生成不同的翻译。为了进一步提高译文的忠实性和多样性，我们提出了两种简单而有效的方法来选择训练语料库中不同的句子对，并相应地调整每对句子的插值权重。此外，通过控制插值权重，我们的方法可以在不需要任何额外训练的情况下实现忠实性和多样性之间的平衡，这在以前的大多数方法中都是必需的。在WMT'16 en ro、WMT'14 en de和WMT'17 zh en上进行的实验表明，我们的方法大大优于以前所有不同的机器翻译方法。摘要：Diverse machine translation aims at generating various target language translations for a given source language sentence. Leveraging the linear relationship in the sentence latent space introduced by the mixup training, we propose a novel method, MixDiversity, to generate different translations for the input sentence by linearly interpolating it with different sentence pairs sampled from the training corpus when decoding. To further improve the faithfulness and diversity of the translations, we propose two simple but effective approaches to select diverse sentence pairs in the training corpus and adjust the interpolation weight for each pair correspondingly. Moreover, by controlling the interpolation weight, our method can achieve the trade-off between faithfulness and diversity without any additional training, which is required in most of the previous methods. Experiments on WMT'16 en-ro, WMT'14 en-de, and WMT'17 zh-en are conducted to show that our method substantially outperforms all previous diverse machine translation methods.

Graph|知识图谱|Knowledge(1篇)

【1】 Memory and Knowledge Augmented Language Models for Inferring Salience in Long-Form Stories 标题：推理长篇故事显著性的记忆和知识增强语言模型链接：https://arxiv.org/abs/2109.03754

作者：David Wilmot,Frank Keller 机构：School of Informatics, University of Edinburgh 备注：Accepted to the EMNLP 2021 Conference as a long-paper, 9 pages, 15 pages with appendices and references, 2 figures, 4 tables 摘要：测量事件显著性对于理解故事至关重要。本文从巴特基本函数和惊奇理论出发，提出了一种新的无监督显著性检测方法，并将其应用于较长的叙事形式。我们改进了标准transformer语言模型，加入了一个外部知识库（源自检索增强生成），并添加了一个内存机制，以提高较长作品的性能。我们使用一种新颖的方法，从Shmoop经典文学作品语料库中使用章节对齐的摘要来获得显著性注释。我们对这些数据的评估表明，我们的显著性检测模型在非知识库和内存增强语言模型的基础上提高了性能，这两个模型对这种改进都至关重要。摘要：Measuring event salience is essential in the understanding of stories. This paper takes a recent unsupervised method for salience detection derived from Barthes Cardinal Functions and theories of surprise and applies it to longer narrative forms. We improve the standard transformer language model by incorporating an external knowledgebase (derived from Retrieval Augmented Generation) and adding a memory mechanism to enhance performance on longer works. We use a novel approach to derive salience annotation using chapter-aligned summaries from the Shmoop corpus for classic literary works. Our evaluation against this data demonstrates that our salience detection model improves performance over and above a non-knowledgebase and memory augmented language model, both of which are crucial to this improvement.

摘要|信息提取(1篇)

【1】 Sequence Level Contrastive Learning for Text Summarization 标题：用于文本摘要的顺序级对比学习链接：https://arxiv.org/abs/2109.03481

作者：Shusheng Xu,Xingxing Zhang,Yi Wu,Furu Wei 机构： IIIS, Tsinghua Unveristy, Beijing, China, Microsoft Research Asia, Beijing, China, Shanghai Qi Zhi institute, Shanghai China 备注：2 figures, 12 tables 摘要：对比学习模型在无监督视觉表征学习中取得了巨大成功，它最大限度地提高了同一图像不同视图的特征表征之间的相似性，同时最小化了不同图像视图的特征表征之间的相似性。在文本摘要中，输出摘要是输入文档的较短形式，它们具有相似的含义。在本文中，我们提出了一个用于监督抽象文本摘要的对比学习模型，在该模型中，我们将文档、其黄金摘要和其模型生成的摘要视为具有相同均值表示的不同视图，并在训练过程中最大化它们之间的相似性。我们在三种不同的摘要数据集上改进了强序列到序列文本生成模型（即BART）。人类评估也表明，与没有对比目标的模型相比，我们的模型获得了更好的忠实度评级。摘要：Contrastive learning models have achieved great success in unsupervised visual representation learning, which maximize the similarities between feature representations of different views of the same image, while minimize the similarities between feature representations of views of different images. In text summarization, the output summary is a shorter form of the input document and they have similar meanings. In this paper, we propose a contrastive learning model for supervised abstractive text summarization, where we view a document, its gold summary and its model generated summaries as different views of the same mean representation and maximize the similarities between them during training. We improve over a strong sequence-to-sequence text generation model (i.e., BART) on three different summarization datasets. Human evaluation also shows that our model achieves better faithfulness ratings compared to its counterpart without contrastive objectives.

推理|分析|理解|解释(3篇)

【1】 Continuous Entailment Patterns for Lexical Inference in Context 标题：语境中词汇推理的连续蕴涵模式链接：https://arxiv.org/abs/2109.03695

作者：Martin Schmitt,Hinrich Schütze 机构：Center for Information and Language Processing (CIS), LMU Munich, Germany 备注：Accepted as a short paper at EMNLP 2021. Code available at this https URL 摘要：将预训练语言模型（PLM）与文本模式相结合在Zero-Shot和Few-Shot设置中都有帮助。对于Zero-Shot性能，设计与自我监督预训练期间看到的文本非常相似的模式是有意义的，因为模型从未见过其他任何东西。有监督的训练允许更大的灵活性。如果我们允许PLM词汇表之外的标记，则模式可以更灵活地适应PLM的特性。对比“标记”可以是任何连续向量的模式与必须在词汇表元素之间进行离散选择的模式，我们称我们的方法为连续模式（CONAN）。我们在两个已建立的上下文词汇推理（LIiC）基准（又称谓词蕴涵）上评估了CONAN，这是一项具有挑战性的自然语言理解任务，训练集相对较小。与离散模式直接比较，柯南不断提高性能，开创了新的艺术境界。我们的实验为增强PLM在LIiC上的性能的模式类型提供了有价值的见解，并提出了有关我们使用文本模式理解PLM的重要问题。摘要：Combining a pretrained language model (PLM) with textual patterns has been shown to help in both zero- and few-shot settings. For zero-shot performance, it makes sense to design patterns that closely resemble the text seen during self-supervised pretraining because the model has never seen anything else. Supervised training allows for more flexibility. If we allow for tokens outside the PLM's vocabulary, patterns can be adapted more flexibly to a PLM's idiosyncrasies. Contrasting patterns where a "token" can be any continuous vector vs. those where a discrete choice between vocabulary elements has to be made, we call our method CONtinuous pAtterNs (CONAN). We evaluate CONAN on two established benchmarks for lexical inference in context (LIiC) a.k.a. predicate entailment, a challenging natural language understanding task with relatively small training sets. In a direct comparison with discrete patterns, CONAN consistently leads to improved performance, setting a new state of the art. Our experiments give valuable insights into the kind of pattern that enhances a PLM's performance on LIiC and raise important questions regarding our understanding of PLMs using text patterns.

【2】 Social Analysis of Young Basque Speaking Communities in Twitter 标题：推特上年轻巴斯克语社区的社会分析链接：https://arxiv.org/abs/2109.03487

作者：J. Fernandez de Landa,R. Agerri 机构： University of the Basque Country UPVEHU{joseba 备注：None 摘要：在本文中，我们通过处理大量巴斯克语推文，从社会和语言两个方面进行人口统计分析。人口统计学特征和社会关系的研究是通过应用机器学习和现代深度学习自然语言处理（NLP）技术，将社会科学与自动文本处理相结合来进行的。更具体地说，我们的主要目标是将人口统计推断和社会分析结合起来，以检测年轻的巴斯克推特用户，并确定他们的关系或共享内容所产生的社区。这种社会和人口统计分析将完全基于自动收集的推文，使用NLP将非结构化文本信息转换为可解释的知识。摘要：In this paper we take into account both social and linguistic aspects to perform demographic analysis by processing a large amount of tweets in Basque language. The study of demographic characteristics and social relationships are approached by applying machine learning and modern deep-learning Natural Language Processing (NLP) techniques, combining social sciences with automatic text processing. More specifically, our main objective is to combine demographic inference and social analysis in order to detect young Basque Twitter users and to identify the communities that arise from their relationships or shared content. This social and demographic analysis will be entirely based on the~automatically collected tweets using NLP to convert unstructured textual information into interpretable knowledge.

【3】 On the Challenges of Evaluating Compositional Explanations in Multi-Hop Inference: Relevance, Completeness, and Expert Ratings 标题：论多跳推理中评价成分解释的挑战：相关性、完备性和专家评分链接：https://arxiv.org/abs/2109.03334

作者：Peter Jansen,Kelly Smith,Dan Moreno,Huitzilin Ortiz 机构：University of Arizona, USA 备注：Accepted to EMNLP 2021 摘要：构建成分解释需要模型结合两个或多个事实，这些事实一起描述问题答案的正确性。通常，这些“多跳”解释相对于一个（或少量）黄金解释进行评估。在这项工作中，我们发现这些评估大大低估了模型的性能，无论是在所包含事实的相关性方面，还是在模型生成的解释的完整性方面，因为模型经常发现并生成与黄金解释不同的有效解释。为了解决这个问题，我们构建了一个包含126k个领域专家（科学教师）相关评级的大型语料库，该语料库补充了标准化科学试题的解释语料库，发现了另外80k个未被评为金牌的相关事实。我们基于不同的方法（生成、排名和模式）构建了三个强大的模型，经验表明，虽然专家增强评级提供了更好的解释质量估计，但都是原始的（黄金）与完全手动的专家判断相比，专家增强的自动评估仍然大大低估了性能，高达36%，不同的模型受到的影响不成比例。这对准确评估组合推理模型产生的解释提出了重大的方法学挑战。摘要：Building compositional explanations requires models to combine two or more facts that, together, describe why the answer to a question is correct. Typically, these "multi-hop" explanations are evaluated relative to one (or a small number of) gold explanations. In this work, we show these evaluations substantially underestimate model performance, both in terms of the relevance of included facts, as well as the completeness of model-generated explanations, because models regularly discover and produce valid explanations that are different than gold explanations. To address this, we construct a large corpus of 126k domain-expert (science teacher) relevance ratings that augment a corpus of explanations to standardized science exam questions, discovering 80k additional relevant facts not rated as gold. We build three strong models based on different methodologies (generation, ranking, and schemas), and empirically show that while expert-augmented ratings provide better estimates of explanation quality, both original (gold) and expert-augmented automatic evaluations still substantially underestimate performance by up to 36% when compared with full manual expert judgements, with different models being disproportionately affected. This poses a significant methodological challenge to accurately evaluating explanations produced by compositional reasoning models.

GAN|对抗|攻击|生成相关(1篇)

【1】 Smelting Gold and Silver for Improved Multilingual AMR-to-Text Generation 标题：冶炼黄金和白银，改进多语言AMR到文本生成链接：https://arxiv.org/abs/2109.03808

作者：Leonardo F. R. Ribeiro,Jonas Pfeiffer,Yue Zhang,Iryna Gurevych 机构：†Ubiquitous Knowledge Processing Lab, Technical University of Darmstadt, ‡School of Engineering, Westlake University 备注：Accepted as a conference paper to EMNLP 2021 摘要：最近关于多语言AMR到文本生成的工作专门关注利用silver AMR的数据扩充策略。但是，这假设生成的AMR质量很高，可能会限制到目标任务的可转移性。在本文中，我们研究了自动生成AMR注释的不同技术，旨在研究哪种信息源可以产生更好的多语言结果。我们的模型在黄金AMR和白银（机器翻译）句子方面的训练优于利用白银AMR生成的方法。我们发现，结合这两个互补的信息源，进一步提高了多语言AMR到文本的生成。我们的模型大大超过了德国、意大利、西班牙和中国之前的先进水平。摘要：Recent work on multilingual AMR-to-text generation has exclusively focused on data augmentation strategies that utilize silver AMR. However, this assumes a high quality of generated AMRs, potentially limiting the transferability to the target task. In this paper, we investigate different techniques for automatically generating AMR annotations, where we aim to study which source of information yields better multilingual results. Our models trained on gold AMR with silver (machine translated) sentences outperform approaches which leverage generated silver AMR. We find that combining both complementary sources of information further improves multilingual AMR-to-text generation. Our models surpass the previous state of the art for German, Italian, Spanish, and Chinese by a large margin.

半/弱/无监督|不确定性(1篇)

【1】 Self- and Pseudo-self-supervised Prediction of Speaker and Key-utterance for Multi-party Dialogue Reading Comprehension 标题：多方对话阅读理解中说话人和关键话语的自监督和伪自监督预测链接：https://arxiv.org/abs/2109.03772

作者：Yiyang Li,Hai Zhao 机构： Department of Computer Science and Engineering, Shanghai Jiao Tong University, Key Laboratory of Shanghai Education Commission for Intelligent Interaction, and Cognitive Engineering, Shanghai Jiao Tong University 备注：Accepted by EMNLP 2021 Findings 摘要：多方对话机器阅读理解（MRC）在一次对话中涉及多个说话人，导致说话人信息流复杂，对话语境嘈杂，给阅读理解带来了巨大的挑战。为了缓解这些困难，以前的模型侧重于如何使用复杂的基于图形的模块和额外的手动标记数据合并这些信息，这在实际场景中通常很少见。在本文中，我们设计了两个关于说话人和关键话语的无劳动自监督和伪自监督预测任务来隐式地模拟说话人的信息流，并在长对话中捕捉显著的线索。在两个基准数据集上的实验结果证明了我们的方法相对于竞争基线和当前最先进的模型的有效性。摘要：Multi-party dialogue machine reading comprehension (MRC) brings tremendous challenge since it involves multiple speakers at one dialogue, resulting in intricate speaker information flows and noisy dialogue contexts. To alleviate such difficulties, previous models focus on how to incorporate these information using complex graph-based modules and additional manually labeled data, which is usually rare in real scenarios. In this paper, we design two labour-free self- and pseudo-self-supervised prediction tasks on speaker and key-utterance to implicitly model the speaker information flows, and capture salient clues in a long dialogue. Experimental results on two benchmark datasets have justified the effectiveness of our method over competitive baselines and current state-of-the-art models.

检测相关(1篇)

【1】 A Dual-Channel Framework for Sarcasm Recognition by Detecting Sentiment Conflict 标题：基于情感冲突检测的反讽识别双通道框架链接：https://arxiv.org/abs/2109.03587

作者：Yiyi Liu,Yequan Wang,Aixin Sun,Zheng Zhang,Jiafeng Guo,Xuying Meng 机构：Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China, School of Computer Science and Engineering, Nanyang Technological University, Singapore, Department of Computer Science and Technology, Tsinghua University, Beijing, China 摘要：讽刺使用矛盾心理，一个人说一些积极的东西，但实际上意味着消极，反之亦然。由于情感的复杂性和模糊性，讽刺给情感分析带来了巨大的挑战。在本文中，我们揭示了讽刺文本的本质是字面情感（通过文本的表面形式表达）与深层情感（通过文本的实际意义表达）相反。为此，我们提出了一个双通道框架，通过对字面情感和深层情感进行建模来识别情感冲突。具体而言，该框架能够检测输入文本的字面意义和深层意义之间的情感冲突。在政治辩论和Twitter数据集上的实验表明，我们的框架在讽刺识别方面取得了最好的性能。摘要：Sarcasm employs ambivalence, where one says something positive but actually means negative, and vice versa. Due to the sophisticated and obscure sentiment, sarcasm brings in great challenges to sentiment analysis. In this paper, we show up the essence of sarcastic text is that the literal sentiment (expressed by the surface form of the text) is opposite to the deep sentiment (expressed by the actual meaning of the text). To this end, we propose a Dual-Channel Framework by modeling both literal and deep sentiments to recognize the sentiment conflict. Specifically, the proposed framework is capable of detecting the sentiment conflict between the literal and deep meanings of the input text. Experiments on the political debates and the Twitter datasets show that our framework achieves the best performance on sarcasm recognition.

识别/分类(5篇)

【1】 Forget me not: A Gentle Reminder to Mind the Simple Multi-Layer Perceptron Baseline for Text Classification 标题：别忘了我：温文尔雅地提醒你记住文本分类的简单多层感知器基线(Simple-Multi-Layer Perceptron Baseline) 链接：https://arxiv.org/abs/2109.03777

作者：Lukas Galke,Ansgar Scherp 机构：University of Kiel ZBW, Germany, University of Ulm 备注：5 pages 摘要：图神经网络引发了基于图的文本分类的复兴。我们证明了一个简单的MLP基线已经在基准数据集上实现了相当的性能，质疑了合成图结构的重要性。当考虑归纳情景时，即：。e、，当向语料库添加新文档时，一个简单的MLP甚至比大多数基于图的模型更出色。我们进一步微调DistilBERT进行比较，发现它优于所有最先进的模型。我们建议未来的研究至少使用MLP基线来分析结果。我们为此类基线的设计和训练提供建议。摘要：Graph neural networks have triggered a resurgence of graph-based text classification. We show that already a simple MLP baseline achieves comparable performance on benchmark datasets, questioning the importance of synthetic graph structures. When considering an inductive scenario, i. e., when adding new documents to a corpus, a simple MLP even outperforms most graph-based models. We further fine-tune DistilBERT for comparison and find that it outperforms all state-of-the-art models. We suggest that future studies use at least an MLP baseline to contextualize the results. We provide recommendations for the design and training of such a baseline.

【2】 Open Aspect Target Sentiment Classification with Natural Language Prompts 标题：基于自然语言提示的开放式方面目标情感分类链接：https://arxiv.org/abs/2109.03685

作者：Ronald Seoh,Ian Birle,Mrinal Tak,Haw-Shiuan Chang,Brian Pinette,Alfred Hough 机构： University of Massachusetts Amherst, Lexalytics, Inc. 摘要：对于许多商业应用程序，我们经常试图分析与商业产品任意方面相关的情绪，尽管标签数量非常有限，甚至根本没有任何标签。然而，现有的方面目标情绪分类（ATSC）模型在没有注释数据集的情况下是不可训练的。即使使用标记数据，它们也无法达到令人满意的性能。为了解决这个问题，我们提出了一些简单的方法，通过自然语言提示更好地解决ATSC问题，使任务能够在Zero-Shot情况下完成，并增强监督设置，特别是在少数镜头情况下。在SemEval 2014 Task 4笔记本电脑域的少数镜头设置下，我们将ATSC重新格式化为NLI任务的方法优于监督SOTA方法，精确度高达24.13点，宏F1点数高达33.14点。此外，我们还证明，我们的提示也可以处理隐含的方面：我们的模型在检测方面类别（如食物）的情绪方面达到了77%左右的准确率，而这些情绪不一定出现在文本中，即使我们仅使用明确提到的方面术语（如fajitas）训练模型仅从16次审查——而无提示基线的准确率仅为65%左右。摘要：For many business applications, we often seek to analyze sentiments associated with any arbitrary aspects of commercial products, despite having a very limited amount of labels or even without any labels at all. However, existing aspect target sentiment classification (ATSC) models are not trainable if annotated datasets are not available. Even with labeled data, they fall short of reaching satisfactory performance. To address this, we propose simple approaches that better solve ATSC with natural language prompts, enabling the task under zero-shot cases and enhancing supervised settings, especially for few-shot cases. Under the few-shot setting for SemEval 2014 Task 4 laptop domain, our method of reformulating ATSC as an NLI task outperforms supervised SOTA approaches by up to 24.13 accuracy points and 33.14 macro F1 points. Moreover, we demonstrate that our prompts could handle implicitly stated aspects as well: our models reach about 77% accuracy on detecting sentiments for aspect categories (e.g., food), which do not necessarily appear within the text, even though we trained the models only with explicitly mentioned aspect terms (e.g., fajitas) from just 16 reviews - while the accuracy of the no-prompt baseline is only around 65%.

【3】 Cross-lingual Offensive Language Identification for Low Resource Languages: The Case of Marathi 标题：低资源语言的跨语言攻击性语言识别--以马拉提语为例链接：https://arxiv.org/abs/2109.03552

作者：Saurabh Gaikwad,Tharindu Ranasinghe,Marcos Zampieri,Christopher M. Homan 机构：Rochester Institute of Technology, USA, University of Wolverhampton, UK 备注：Accepted to RANLP 2021 摘要：攻击性语言在社交媒体上的广泛存在推动了能够自动识别此类内容的系统的发展。除了几个显著的例外，大多数关于自动攻击性语言识别的研究都涉及英语。为了解决这个缺点，我们引入了MOLD，即马拉地攻击性语言数据集。MOLD是第一个为马拉地语编译的同类数据集，从而为低资源印度-雅利安语的研究开辟了一个新领域。我们展示了在该数据集上进行的几次机器学习实验的结果，包括zero short和其他基于现有孟加拉语、英语和印地语数据的最新跨语言转换学习实验。摘要：The widespread presence of offensive language on social media motivated the development of systems capable of recognizing such content automatically. Apart from a few notable exceptions, most research on automatic offensive language identification has dealt with English. To address this shortcoming, we introduce MOLD, the Marathi Offensive Language Dataset. MOLD is the first dataset of its kind compiled for Marathi, thus opening a new domain for research in low-resource Indo-Aryan languages. We present results from several machine learning experiments on this dataset, including zero-short and other transfer learning experiments on state-of-the-art cross-lingual transformers from existing data in Bengali, English, and Hindi.

【4】 A Dual-Decoder Conformer for Multilingual Speech Recognition 标题：一种适用于多语种语音识别的双解码器整形器链接：https://arxiv.org/abs/2109.03277

作者：Krishna D N 机构：Freshworks Inc. 备注：5 pages 摘要：基于转换器的模型最近在序列到序列的应用中非常流行，如机器翻译和语音识别。本文提出了一种用于印度语低资源多语言语音识别的双解码器转换器模型。我们提出的模型由一个Conformer[1]编码器、两个并行转换器解码器和一个语言分类器组成。我们使用音素解码器（PHN-DEC）来完成音素识别任务，使用字形解码器（GRP-DEC）来预测字形序列和语言信息。在多任务学习框架中，我们将音素识别和语言识别作为辅助任务。我们通过联合CTC注意[2]训练，共同优化音素识别、字形识别和语言识别任务的网络。我们的实验表明，与基线方法相比，我们可以显著降低WER。我们还表明，与单解码器方法相比，我们的双解码器方法获得了显著的改进。摘要：Transformer-based models have recently become very popular for sequence-to-sequence applications such as machine translation and speech recognition. This work proposes a dual-decoder transformer model for low-resource multilingual speech recognition for Indian languages. Our proposed model consists of a Conformer [1] encoder, two parallel transformer decoders, and a language classifier. We use a phoneme decoder (PHN-DEC) for the phoneme recognition task and a grapheme decoder (GRP-DEC) to predict grapheme sequence along with language information. We consider phoneme recognition and language identification as auxiliary tasks in the multi-task learning framework. We jointly optimize the network for phoneme recognition, grapheme recognition, and language identification tasks with Joint CTC-Attention [2] training. Our experiments show that we can obtain a significant reduction in WER over the baseline approaches. We also show that our dual-decoder approach obtains significant improvement over the single decoder approach.

【5】 Efficient conformer: Progressive downsampling and grouped attention for automatic speech recognition 标题：高效一致性：自动语音识别中的渐进式下采样和分组注意链接：https://arxiv.org/abs/2109.01163

作者：Maxime Burchi,Valentin Vielzeuf 机构：Orange Labs, Cesson-S´evign´e, France 备注：None 摘要：最近提出的Conformer体系结构通过结合卷积和对模型局部和全局依赖性的关注，在自动语音识别中显示了最先进的性能。在本文中，我们研究如何在有限的计算预算下降低Conformer体系结构的复杂性，从而实现一种更高效的体系结构设计，我们称之为高效Conformer。我们在构象编码器中引入渐进下采样，并提出了一种新的注意机制，称为分组注意，允许我们将注意复杂度从$O（n^{2}d）$降低到$O（n^{2}d/g）$，用于序列长度$n$、隐藏维度$d$和组大小参数$g$。我们还实验了使用跨步多头自我注意作为一种全局下采样操作。我们的实验是在LibriSpeech数据集上进行的，带有CTC和RNN传感器损耗。我们表明，在相同的计算预算下，与一致性架构相比，该架构在更快的训练和解码速度下实现了更好的性能。我们的13M参数CTC模型在不使用语言模型的情况下实现了3.6%/9.0%的有竞争力的WER，在测试清洁/测试其他集合的情况下，使用外部n-gram语言模型实现了2.7%/6.7%的有竞争力的WER，同时在推理时比我们的CTC一致性基线快29%，训练速度快36%。摘要：The recently proposed Conformer architecture has shown state-of-the-art performances in Automatic Speech Recognition by combining convolution with attention to model both local and global dependencies. In this paper, we study how to reduce the Conformer architecture complexity with a limited computing budget, leading to a more efficient architecture design that we call Efficient Conformer. We introduce progressive downsampling to the Conformer encoder and propose a novel attention mechanism named grouped attention, allowing us to reduce attention complexity from $O(n^{2}d)$ to $O(n^{2}d / g)$ for sequence length $n$, hidden dimension $d$ and group size parameter $g$. We also experiment the use of strided multi-head self-attention as a global downsampling operation. Our experiments are performed on the LibriSpeech dataset with CTC and RNN-Transducer losses. We show that within the same computing budget, the proposed architecture achieves better performances with faster training and decoding compared to the Conformer. Our 13M parameters CTC model achieves competitive WERs of 3.6%/9.0% without using a language model and 2.7%/6.7% with an external n-gram language model on the test-clean/test-other sets while being 29% faster than our CTC Conformer baseline at inference and 36% faster to train.

Zero/Few/One-Shot|迁移|自适应(1篇)

【1】 Label Verbalization and Entailment for Effective Zero- and Few-Shot Relation Extraction 标题：标签动词化和蕴涵在零少射关系抽取中的应用链接：https://arxiv.org/abs/2109.03659

作者：Oscar Sainz,Oier Lopez de Lacalle,Gorka Labaka,Ander Barrena,Eneko Agirre 机构：HiTZ Basque Center for Language Technologies - Ixa NLP Group, University of the Basque Country (UPVEHU) 备注：Accepted at EMNLP2021 摘要：关系抽取系统需要大量标注的示例，这些示例的标注成本很高。在这项工作中，我们将关系提取重新定义为一项包含任务，每个关系在不到15分钟的时间内生成简单、手工制作的关系表达。该系统依赖于一个预先训练的文本蕴涵引擎，该引擎按原样运行（无训练示例，Few-Shot），或进一步微调标记示例（少量射击或完全训练）。在我们对TACRED的实验中，我们获得了63%的F1Few-Shot，69%的F1Few-Shot，每个关系有16个示例（在相同条件下，比最佳监督系统好17%），仅比最先进的系统少4个点（使用了20倍多的训练数据）。我们还表明，使用更大的蕴涵模型可以显著提高性能，在Few-Shot中最多可以得到12分，允许在充分训练的情况下在TACRED上报告迄今为止的最佳结果。分析表明，我们的Few-Shot系统在区分关系时特别有效，并且低数据区域的性能差异主要来自识别无关系情况。摘要：Relation extraction systems require large amounts of labeled examples which are costly to annotate. In this work we reformulate relation extraction as an entailment task, with simple, hand-made, verbalizations of relations produced in less than 15 min per relation. The system relies on a pretrained textual entailment engine which is run as-is (no training examples, zero-shot) or further fine-tuned on labeled examples (few-shot or fully trained). In our experiments on TACRED we attain 63% F1 zero-shot, 69% with 16 examples per relation (17% points better than the best supervised system on the same conditions), and only 4 points short to the state-of-the-art (which uses 20 times more training data). We also show that the performance can be improved significantly with larger entailment models, up to 12 points in zero-shot, allowing to report the best results to date on TACRED when fully trained. The analysis shows that our few-shot systems are specially effective when discriminating between relations, and that the performance difference in low data regimes comes mainly from identifying no-relation cases.

语料库(1篇)

【1】 Corpus-based Open-Domain Event Type Induction 标题：基于语料库的开放领域事件类型归纳链接：https://arxiv.org/abs/2109.03322

作者：Jiaming Shen,Yunyi Zhang,Heng Ji,Jiawei Han 机构：Department of Computer Science, University of Illinois Urbana-Champaign, IL, USA 备注：14 pages, EMNLP 2021 main conference 摘要：传统的事件提取方法需要预定义的事件类型及其相应的注释来学习事件提取器。在实际应用中，这些先决条件通常很难满足。本文提出了一种基于语料库的开放域事件类型归纳方法，该方法可以从给定的语料库中自动发现一组事件类型。由于同一类型的事件可以用多种方式表示，我们建议将每种事件类型表示为一组对。具体地说，我们的方法（1）选择显著谓词和宾语头，（2）仅使用动词意义词典消除谓词意义的歧义，（3）通过在潜在球形空间中联合嵌入和聚类对来获得事件类型。我们在来自不同领域的三个数据集上的实验表明，我们的方法可以根据自动和人工评估发现显著和高质量的事件类型。摘要：Traditional event extraction methods require predefined event types and their corresponding annotations to learn event extractors. These prerequisites are often hard to be satisfied in real-world applications. This work presents a corpus-based open-domain event type induction method that automatically discovers a set of event types from a given corpus. As events of the same type could be expressed in multiple ways, we propose to represent each event type as a cluster of pairs. Specifically, our method (1) selects salient predicates and object heads, (2) disambiguates predicate senses using only a verb sense dictionary, and (3) obtains event types by jointly embedding and clustering pairs in a latent spherical space. Our experiments, on three datasets from different domains, show our method can discover salient and high-quality event types, according to both automatic and human evaluations.

Word2Vec|文本|单词(2篇)

【1】 Spelling provides a precise (but sometimes misplaced) phonological target. Orthography and acoustic variability in second language word learning 标题：拼写提供了一个精确的(但有时放错地方的)语音目标。第二语言词汇学习中的正字法与声学变异性链接：https://arxiv.org/abs/2109.03490

作者：Pauline Welby,Elsa Spinelli,Audrey Bürki 机构：Aix Marseille Univ, CNRS, LPL, Aix-en-Provence, France, Univ. Grenoble Alpes, CNRS, LPNC, Grenoble, France, Department of Linguistics, University of Potsdam, Potsdam, Germany, Corresponding author:, Laboratoire Parole et Langage (LPL) 摘要：母语为法语的参与者在两天的学习过程中学习了新颖的二语英语单词，其中一半的单词呈现了它们的正字法形式（音频正交），另一半没有（仅音频）。一组人听一个说话者念单词，另一组人听多个说话者念单词。第三天，他们完成了各种任务来评估他们的学习情况。我们的结果显示了正字法的强大影响，对于在音频正交条件下学习的单词，在生成（图片命名）和识别（图片映射）任务中的响应时间都更快。此外，对图片命名反应的共振峰分析表明，正字法输入将英语新词的发音拉向非母语（法语）语音目标。用正字法学习的单词发音更准确（离散度分数更小），但在元音空间中放错了位置（与法语元音相比，欧几里德距离更小）。最后，我们发现只有有限的证据表明基于说话者的声学可变性的影响：多个说话者学习的新词在图片命名任务中显示出更快的响应时间，但仅在纯音频条件下，这表明正交信息可能压倒了基于说话者的声学可变性的任何优势。摘要：L1 French participants learned novel L2 English words over two days of learning sessions, with half of the words presented with their orthographic forms (Audio-Ortho) and half without (Audio only). One group heard the words pronounced by a single talker, while another group heard them pronounced by multiple talkers. On the third day, they completed a variety of tasks to evaluate their learning. Our results show a robust influence of orthography, with faster response times in both production (picture naming) and recognition (picture mapping) tasks for words learned in the Audio-Ortho condition. Moreover, formant analyses of the picture naming responses show that orthographic input pulls pronunciations of English novel words towards a non-native (French) phonological target. Words learned with their orthographic forms were pronounced more precisely (with smaller Dispersion Scores), but were misplaced in the vowel space (as reflected by smaller Euclidian distances with respect to French vowels). Finally, we found only limited evidence of an effect of talker-based acoustic variability: novel words learned with multiple talkers showed faster responses times in the picture naming task, but only in the Audio-only condition, which suggests that orthographic information may have overwhelmed any advantage of talker-based acoustic variability.

【2】 Text-Free Prosody-Aware Generative Spoken Language Modeling 标题：无文本韵律感知的生成性口语建模链接：https://arxiv.org/abs/2109.03264

作者：Eugene Kharitonov,Ann Lee,Adam Polyak,Yossi Adi,Jade Copet,Kushal Lakhotia,Tu-Anh Nguyen,Morgane Rivière,Abdelrahman Mohamed,Emmanuel Dupoux,Wei-Ning Hsu 机构：Facebook AI Research 摘要：语音预训练主要证明了其在分类任务上的有效性，而其生成新语音的能力，类似于GPT-2生成连贯段落的能力，几乎没有被探索过。生成性口语建模（GSLM）（Lakhotia et al.，2021）是之前唯一一项解决语音预训练生成方面的工作，它用发现的类似手机的语言建模单元替换文本，并显示出生成有意义的新句子的能力。不幸的是，尽管消除了对文本的需求，但GSLM中使用的单元丢弃了大部分韵律信息。因此，GSLM无法利用韵律来更好地理解，也无法生成富有表现力的语音。在这项工作中，我们提出了一个韵律感知的生成性口语模型（pGSLM）。它由语音的多流变换器语言模型（MS-TLM）组成，表示为发现的单元和韵律特征流，以及将MS-TLM输出转换为波形的自适应HiFi GAN模型。我们为韵律建模和生成设计了一系列度量，并将GSLM中的度量用于内容建模。实验结果表明，pGSLM可以利用韵律改进韵律和内容建模，并在语音提示下生成自然、有意义和连贯的语音。音频样本可在以下网址找到：https://speechbot.github.io/pgslm. 摘要：Speech pre-training has primarily demonstrated efficacy on classification tasks, while its capability of generating novel speech, similar to how GPT-2 can generate coherent paragraphs, has barely been explored. Generative Spoken Language Modeling (GSLM) (Lakhotia et al., 2021) is the only prior work addressing the generative aspects of speech pre-training, which replaces text with discovered phone-like units for language modeling and shows the ability to generate meaningful novel sentences. Unfortunately, despite eliminating the need of text, the units used in GSLM discard most of the prosodic information. Hence, GSLM fails to leverage prosody for better comprehension, and does not generate expressive speech. In this work, we present a prosody-aware generative spoken language model (pGSLM). It is composed of a multi-stream transformer language model (MS-TLM) of speech, represented as discovered unit and prosodic feature streams, and an adapted HiFi-GAN model converting MS-TLM outputs to waveforms. We devise a series of metrics for prosody modeling and generation, and re-use metrics from GSLM for content modeling. Experimental results show that the pGSLM can utilize prosody to improve both prosody and content modeling, and also generate natural, meaningful, and coherent speech given a spoken prompt. Audio samples can be found at https://speechbot.github.io/pgslm.

其他神经网络|深度学习|模型|建模(6篇)

【1】 Active Learning by Acquiring Contrastive Examples 标题：通过获取对比实例进行主动学习链接：https://arxiv.org/abs/2109.03764

作者：Katerina Margatina,Giorgos Vernikos,Loïc Barrault,Nikolaos Aletras 机构：†University of Sheffield, ‡EPFL, ∗HEIG-VD 备注：Accepted at EMNLP 2021 摘要：用于主动学习的常见采集函数使用不确定性或多样性采样，旨在分别从未标记数据池中选择困难和多样的数据点。在这项工作中，利用这两个方面的优势，我们提出了一个采集函数，用于选择\text{对比示例}，即模型特征空间中相似但模型输出最大不同预测可能性的数据点。我们将我们的方法CAL（对比主动学习）与四个自然语言理解任务和七个数据集中的一组不同的习得功能进行了比较。我们的实验表明，无论是域内还是域外数据，CAL在所有任务中的性能始终优于或等于最佳性能基线。我们还对我们的方法进行了广泛的研究，并进一步分析了所有主动获取的数据集，结果表明，与其他策略相比，CAL在不确定性和多样性之间实现了更好的权衡。摘要：Common acquisition functions for active learning use either uncertainty or diversity sampling, aiming to select difficult and diverse data points from the pool of unlabeled data, respectively. In this work, leveraging the best of both worlds, we propose an acquisition function that opts for selecting \textit{contrastive examples}, i.e. data points that are similar in the model feature space and yet the model outputs maximally different predictive likelihoods. We compare our approach, CAL (Contrastive Active Learning), with a diverse set of acquisition functions in four natural language understanding tasks and seven datasets. Our experiments show that CAL performs consistently better or equal than the best performing baseline across all tasks, on both in-domain and out-of-domain data. We also conduct an extensive ablation study of our method and we further analyze all actively acquired datasets showing that CAL achieves a better trade-off between uncertainty and diversity compared to other strategies.

【2】 Sustainable Modular Debiasing of Language Models 标题：语言模型的可持续模块化去偏链接：https://arxiv.org/abs/2109.03646

作者：Anne Lauscher,Tobias Lüken,Goran Glavaš 机构：MilaNLP, Bocconi University, Via Sarfatti , Milan, Italy, Data and Web Science Group, University of Mannheim, B , Mannheim, Germany 备注：Accepted for EMNLP-Findings 2021 摘要：现代预训练语言模型（PLM）中编码的不公平的陈规定型偏见（如性别、种族或宗教偏见）对广泛采用最先进的语言技术具有负面的伦理影响。为了解决这一问题，最近引入了一系列的借记技术，以消除PLM中的这种陈规定型偏见。然而，现有的借记方法直接修改了所有PLMs参数，这除了计算成本高之外，还带来了（灾难性的）忘记在预训练中获得的有用语言知识的固有风险。在这项工作中，我们提出了一种基于专用借记适配器（称为ADELE）的更可持续的模块化借记方法。具体地说，我们（1）将适配器模块注入原始PLM层，（2）通过在反事实扩充语料库上进行语言建模训练，仅更新适配器（即，我们将原始PLM参数冻结）。我们展示了ADELE，在《BERT的性别减损：我们的广泛评估》中，包括三个内在和两个外在的偏见测量，使得ADELE在减少偏见方面非常有效。我们进一步证明，由于其模块化性质，ADELE与任务适配器结合，即使在大规模下游训练之后也能保持公平性。最后，通过多语言的BERT，我们成功地将ADELE转换成六种目标语言。摘要：Unfair stereotypical biases (e.g., gender, racial, or religious biases) encoded in modern pretrained language models (PLMs) have negative ethical implications for widespread adoption of state-of-the-art language technology. To remedy for this, a wide range of debiasing techniques have recently been introduced to remove such stereotypical biases from PLMs. Existing debiasing methods, however, directly modify all of the PLMs parameters, which -- besides being computationally expensive -- comes with the inherent risk of (catastrophic) forgetting of useful language knowledge acquired in pretraining. In this work, we propose a more sustainable modular debiasing approach based on dedicated debiasing adapters, dubbed ADELE. Concretely, we (1) inject adapter modules into the original PLM layers and (2) update only the adapters (i.e., we keep the original PLM parameters frozen) via language modeling training on a counterfactually augmented corpus. We showcase ADELE, in gender debiasing of BERT: our extensive evaluation, encompassing three intrinsic and two extrinsic bias measures, renders ADELE, very effective in bias mitigation. We further show that -- due to its modular nature -- ADELE, coupled with task adapters, retains fairness even after large-scale downstream training. Finally, by means of multilingual BERT, we successfully transfer ADELE, to six target languages.

【3】 Discrete and Soft Prompting for Multilingual Models 标题：多语言模型的离散软提示链接：https://arxiv.org/abs/2109.03630

作者：Mengjie Zhao,Hinrich Schütze 机构：CIS, LMU Munich, Germany 备注：EMNLP 2021 摘要：在英语学习中，离散和软提示在预训练语言模型（PLM）的Few-Shot学习中表现得很好。在本文中，我们证明了离散和软提示在多语种情况下的表现优于微调：跨语种迁移和多语种自然语言推理的语言训练。例如，在48个英语训练示例中，FineTunning在跨语言迁移方面获得了33.74%的准确率，几乎没有超过大多数基线（33.33%）。相比之下，离散和软激励的表现优于微调，分别达到36.43%和38.79%。我们还用英语以外的多种语言的训练数据演示了良好的提示性能。摘要：It has been shown for English that discrete and soft prompting perform strongly in few-shot learning with pretrained language models (PLMs). In this paper, we show that discrete and soft prompting perform better than finetuning in multilingual cases: Crosslingual transfer and in-language training of multilingual natural language inference. For example, with 48 English training examples, finetuning obtains 33.74% accuracy in crosslingual transfer, barely surpassing the majority baseline (33.33%). In contrast, discrete and soft prompting outperform finetuning, achieving 36.43% and 38.79%. We also demonstrate good performance of prompting with training data in multiple languages other than English.

【4】 Biomedical and Clinical Language Models for Spanish: On the Benefits of Domain-Specific Pretraining in a Mid-Resource Scenario 标题：西班牙语的生物医学和临床语言模型：关于中等资源情景下特定领域预训练的好处链接：https://arxiv.org/abs/2109.03570

作者：Casimiro Pio Carrino,Jordi Armengol-Estapé,Asier Gutiérrez-Fandiño,Joan Llop-Palao,Marc Pàmies,Aitor Gonzalez-Agirre,Marta Villegas 机构：Text Mining Unit, Barcelona Supercomputing Center 备注：9 pages 摘要：这项工作提出了西班牙语的生物医学和临床语言模型，通过实验不同的训练前选择，如单词和子单词水平的掩蔽，改变词汇大小，使用领域数据进行测试，寻找更好的语言表征。有趣的是，在缺乏足够的临床数据从头开始训练模型的情况下，我们采用混合域预训练和跨域转移方法来生成适合真实临床数据的高性能生物临床模型。我们评估了生物医学文档命名实体识别（NER）任务和具有挑战性的出院报告的模型。与竞争性mBERT和BETO模型相比，我们在所有NER任务中的表现都显著优于它们。最后，我们通过提供一个有趣的以词汇为中心的分析，研究了该模型的词汇对NER绩效的影响。结果证实，特定领域的预训练对于在下游NER任务中实现更高的性能至关重要，即使在中等资源的情况下也是如此。据我们所知，我们为西班牙语提供了第一个基于生物医学和临床转换器的预训练语言模型，旨在促进母语西班牙语NLP在生物医学中的应用。我们的模型出版后将免费提供。摘要：This work presents biomedical and clinical language models for Spanish by experimenting with different pretraining choices, such as masking at word and subword level, varying the vocabulary size and testing with domain data, looking for better language representations. Interestingly, in the absence of enough clinical data to train a model from scratch, we applied mixed-domain pretraining and cross-domain transfer approaches to generate a performant bio-clinical model suitable for real-world clinical data. We evaluated our models on Named Entity Recognition (NER) tasks for biomedical documents and challenging hospital discharge reports. When compared against the competitive mBERT and BETO models, we outperform them in all NER tasks by a significant margin. Finally, we studied the impact of the model's vocabulary on the NER performances by offering an interesting vocabulary-centric analysis. The results confirm that domain-specific pretraining is fundamental to achieving higher performances in downstream NER tasks, even within a mid-resource scenario. To the best of our knowledge, we provide the first biomedical and clinical transformer-based pretrained language models for Spanish, intending to boost native Spanish NLP applications in biomedicine. Our models will be made freely available after publication.

【5】 On the Transferability of Pre-trained Language Models: A Study from Artificial Datasets 标题：基于人工数据集的预训练语言模型可移植性研究链接：https://arxiv.org/abs/2109.03537

作者：Cheng-Han Chiang,Hung-yi Lee 机构：National Taiwan University,Taiwan 备注：Preprint, under review. 10 pages, 3 figures, 2 tables 摘要：在大规模未标记文本数据上预训练语言模型（LMs）使该模型比直接在下游任务上训练的模型更容易获得优异的下游性能。在这项工作中，我们研究了除了语义之外，预训练数据中的哪些特定特征使预训练的LM优于在下游任务中从头训练的LM。我们建议使用人工构建的数据集作为预训练数据，以排除语义的影响，并进一步控制预训练语料库的特征。通过在GLUE benchmark上微调预先训练的模型，我们可以了解从具有特定特征的数据集上训练的模型中转移知识是多么有益。我们定义并讨论了人工数据集中的三个不同特征：1）在预训练和下游微调之间匹配令牌的单元或双元分布，2）序列中令牌之间存在显式依赖，3）序列中令牌之间隐式依赖的长度。我们的实验表明，预训练数据序列中的显式依赖关系对下游性能至关重要。我们的结果还表明，当在具有更大范围的隐式依赖关系的数据集上进行预训练时，模型可以获得更好的下游性能。基于我们的分析，我们发现使用人工数据集预先训练的模型在下游任务中容易学习虚假相关性。我们的工作表明，即使LMs没有对自然语言进行预训练，但一旦LMs学会对序列中的令牌依赖性建模，它们仍然可以在某些人类语言下游任务上获得可转移性。这一结果有助于我们理解预先训练的LMs的异常可转移性。摘要：Pre-training language models (LMs) on large-scale unlabeled text data makes the model much easier to achieve exceptional downstream performance than their counterparts directly trained on the downstream tasks. In this work, we study what specific traits in the pre-training data, other than the semantics, make a pre-trained LM superior to their counterparts trained from scratch on downstream tasks. We propose to use artificially constructed datasets as the pre-training data to exclude the effect of semantics, and further control what characteristics the pre-training corpora have. By fine-tuning the pre-trained models on GLUE benchmark, we can learn how beneficial it is to transfer the knowledge from the model trained on the dataset possessing that specific trait. We define and discuss three different characteristics in the artificial dataset: 1) matching the token's uni-gram or bi-gram distribution between pre-training and downstream fine-tuning, 2) the presence of the explicit dependencies among the tokens in a sequence, 3) the length of the implicit dependencies among the tokens in a sequence. Our experiments show that the explicit dependencies in the sequences of the pre-training data are critical to the downstream performance. Our results also reveal that models achieve better downstream performance when pre-trained on a dataset with a longer range of implicit dependencies. Based on our analysis, we find that models pre-trained with artificial datasets are prone to learn spurious correlation in downstream tasks. Our work reveals that even if the LMs are not pre-trained on natural language, they still gain transferability on certain human language downstream tasks once the LMs learn to model the token dependencies in the sequences. This result helps us understand the exceptional transferability of pre-trained LMs.

【6】 Hi, my name is Martha: Using names to measure and mitigate bias in generative dialogue models 标题：嗨，我的名字是玛莎：使用名字来衡量和减轻生成性对话模式中的偏见链接：https://arxiv.org/abs/2109.03300

作者：Eric Michael Smith,Adina Williams 机构：Facebook AI Research 摘要：所有人工智能模型都容易受到训练数据中的学习偏差的影响。对于生成性对话模型，对包含不平衡性别和种族/民族参照的真实人类对话进行训练可能会产生显示习得性偏见的模型，我们在这里广义地将其定义为基于人口统计组的对话中词语分布或语义内容的任何可测量差异。我们通过在对话模型的两个副本之间进行人工对话来衡量这种偏见的强度，使一个对话伙伴说出一个通常与某个性别和/或种族/民族相关的名字。我们发现，更大的能力模型往往表现出更多的性别偏见和更大的刻板印象的职业性别。我们证明了调整这些对话模型的几种方法，特别是名称混乱、控制生成和不可能训练，能够有效地减少对话中的偏见，包括对下游对话任务的偏见。当伴侣的名字与不同的性别或种族/民族有关时，姓名混淆也可以有效地降低会话中代币使用的差异。摘要：All AI models are susceptible to learning biases in data that they are trained on. For generative dialogue models, being trained on real human conversations containing unbalanced gender and race/ethnicity references can lead to models that display learned biases, which we define here broadly as any measurable differences in the distributions of words or semantic content of conversations based on demographic groups. We measure the strength of such biases by producing artificial conversations between two copies of a dialogue model, conditioning one conversational partner to state a name commonly associated with a certain gender and/or race/ethnicity. We find that larger capacity models tend to exhibit more gender bias and greater stereotyping of occupations by gender. We show that several methods of tuning these dialogue models, specifically name scrambling, controlled generation, and unlikelihood training, are effective in reducing bias in conversation, including on a downstream conversational task. Name scrambling is also effective in lowering differences in token usage across conversations where partners have names associated with different genders or races/ethnicities.

其他(6篇)

【1】 Highly Parallel Autoregressive Entity Linking with Discriminative Correction 标题：具有判别校正的高度并行自回归实体链接链接：https://arxiv.org/abs/2109.03792

作者：Nicola De Cao,Wilker Aziz,Ivan Titov 机构：University of Amsterdam,University of Edinburgh 备注：Accepted at EMNLP2021 Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Code at this https URL . 8 pages, 1 figure, 3 tables 摘要：生成性方法最近被证明对实体消歧和实体链接（即联合提及检测和消歧）都是有效的。然而，先前提出的EL自回归公式存在以下问题：i）复杂（深度）解码器导致的高计算成本；ii）随源序列长度扩展的非并行解码；以及iii）需要对大量数据进行训练。在这项工作中，我们提出了一种非常有效的方法，该方法将所有潜在提及的自回归链接并行化，并依赖于浅层高效解码器。此外，我们还通过一个额外的鉴别成分，即一个校正项，来增强生成目标，使我们能够直接优化生成器的排名。综合起来，这些技术解决了上述所有问题：我们的模型比以前的生成方法快70倍以上，精度更高，优于标准英语数据集AIDA CoNLL上的最先进方法。源代码可在https://github.com/nicola-decao/efficient-autoregressive-EL 摘要：Generative approaches have been recently shown to be effective for both Entity Disambiguation and Entity Linking (i.e., joint mention detection and disambiguation). However, the previously proposed autoregressive formulation for EL suffers from i) high computational cost due to a complex (deep) decoder, ii) non-parallelizable decoding that scales with the source sequence length, and iii) the need for training on a large amount of data. In this work, we propose a very efficient approach that parallelizes autoregressive linking across all potential mentions and relies on a shallow and efficient decoder. Moreover, we augment the generative objective with an extra discriminative component, i.e., a correction term which lets us directly optimize the generator's ranking. When taken together, these techniques tackle all the above issues: our model is >70 times faster and more accurate than the previous generative method, outperforming state-of-the-art approaches on the standard English dataset AIDA-CoNLL. Source code available at https://github.com/nicola-decao/efficient-autoregressive-EL

【2】 TrollsWithOpinion: A Dataset for Predicting Domain-specific Opinion Manipulation in Troll Memes 标题：Troll sWithOpinion：一个用于预测Troll模因中特定领域的意见操纵的数据集链接：https://arxiv.org/abs/2109.03571

作者：Shardul Suryawanshi,Bharathi Raja Chakravarthi,Mihael Arcan,Suzanne Little,Paul Buitelaar 机构：Received: date Accepted: date 摘要：利用文本（IWT）巨魔模因对图像进行分类的研究最近变得很流行。由于网络社区利用模因的庇护所来表达自己，因此有大量以模因形式存在的数据。这些模因有可能贬低、骚扰或欺负目标个人。此外，目标个人可能成为舆论操纵的牺牲品。为了理解模因在观点操纵中的使用，我们定义了三个特定的领域（产品、政治或其他），我们将其分为troll和not troll，有或没有观点操纵。为了实现此分析，我们通过使用定义的类对数据进行注释来增强现有数据集，从而生成一个8881 IWT或英语多模态模因的数据集（TrollSwithopine数据集）。我们在带注释的数据集上进行基线实验，结果表明，现有的最新技术只能达到加权平均F1分数0.37。这表明需要发展一种特殊的技术来处理多模态巨魔模因。摘要：Research into the classification of Image with Text (IWT) troll memes has recently become popular. Since the online community utilizes the refuge of memes to express themselves, there is an abundance of data in the form of memes. These memes have the potential to demean, harras, or bully targeted individuals. Moreover, the targeted individual could fall prey to opinion manipulation. To comprehend the use of memes in opinion manipulation, we define three specific domains (product, political or others) which we classify into troll or not-troll, with or without opinion manipulation. To enable this analysis, we enhanced an existing dataset by annotating the data with our defined classes, resulting in a dataset of 8,881 IWT or multimodal memes in the English language (TrollsWithOpinion dataset). We perform baseline experiments on the annotated dataset, and our result shows that existing state-of-the-art techniques could only reach a weighted-average F1-score of 0.37. This shows the need for a development of a specific technique to deal with multimodal troll memes.

【3】 Time Alignment using Lip Images for Frame-based Electrolaryngeal Voice Conversion 标题：利用嘴唇图像进行基于帧的电喉语音转换的时间对准链接：https://arxiv.org/abs/2109.03551

作者：Yi-Syuan Liou,Wen-Chin Huang,Ming-Chi Yen,Shu-Wei Tsai,Yu-Huai Peng,Tomoki Toda,Yu Tsao,Hsin-Min Wang 机构：∗ Academia Sinica, Taiwan, † Nagoya University, Japan, ‡ National Cheng Kung University Hospital, Taiwan 备注：Accepted to APSIPA ASC 2021 摘要：语音转换（VC）是一种有效的电致语音增强方法，旨在提高电致语音设备的人工语音质量。在基于帧的VC方法中，时间对齐需要在模型训练之前进行，而动态时间扭曲（DTW）算法被广泛用于计算每个话语对之间的最佳时间对齐。有效性基于说话人的相同音素具有相似特征的假设，并且可以通过测量源和目标语音帧之间的预定义距离来映射。然而，EL语音的特殊特性可能会打破这一假设，导致次优DTW对齐。在这项工作中，我们建议使用嘴唇图像进行时间校准，因为我们假设与健康人相比，喉切除术的嘴唇运动保持正常。我们研究了两种朴素的嘴唇表示和距离度量，实验结果表明，该方法在客观和主观评价方面明显优于纯音频对齐。摘要：Voice conversion (VC) is an effective approach to electrolaryngeal (EL) speech enhancement, a task that aims to improve the quality of the artificial voice from an electrolarynx device. In frame-based VC methods, time alignment needs to be performed prior to model training, and the dynamic time warping (DTW) algorithm is widely adopted to compute the best time alignment between each utterance pair. The validity is based on the assumption that the same phonemes of the speakers have similar features and can be mapped by measuring a pre-defined distance between speech frames of the source and the target. However, the special characteristics of the EL speech can break the assumption, resulting in a sub-optimal DTW alignment. In this work, we propose to use lip images for time alignment, as we assume that the lip movements of laryngectomee remain normal compared to healthy people. We investigate two naive lip representations and distance metrics, and experimental results demonstrate that the proposed method can significantly outperform the audio-only alignment in terms of objective and subjective evaluations.

【4】 RefineCap: Concept-Aware Refinement for Image Captioning 标题：RefineCap：图像字幕的概念感知精化链接：https://arxiv.org/abs/2109.03529

作者：Yekun Chai,Shuo Jin,Junliang Xing 机构：†Institute of Automation, Chinese Academy of Sciences, ‡University of Pittsburgh 备注：Accepted at ViGIL @NAACL 2021 摘要：自动将图像转换为文本涉及图像场景理解和语言建模。在本文中，我们提出了一个新的模型，称为RefineCap，它使用解码器引导的视觉语义来细化语言解码器的输出词汇表，并隐式地学习视觉标记词和图像之间的映射。提出的视觉概念细化方法可以使生成者关注图像中的语义细节，从而生成更多语义描述的字幕。与以前基于视觉概念的模型相比，我们的模型在MS-COCO数据集上实现了优异的性能。摘要：Automatically translating images to texts involves image scene understanding and language modeling. In this paper, we propose a novel model, termed RefineCap, that refines the output vocabulary of the language decoder using decoder-guided visual semantics, and implicitly learns the mapping between visual tag words and images. The proposed Visual-Concept Refinement method can allow the generator to attend to semantic details in the image, thereby generating more semantically descriptive captions. Our model achieves superior performance on the MS-COCO dataset in comparison with previous visual-concept based models.

【5】 DeepZensols: Deep Natural Language Processing Framework 标题：DeepZensols：深层自然语言处理框架链接：https://arxiv.org/abs/2109.03383

作者：Paul Landes,Barbara Di Eugenio,Cornelia Caragea 机构：Department of Computer Science, University of Illinois at Chicago 摘要：通过分发公开的源代码在出版物中复制结果变得越来越流行。鉴于再现机器学习（ML）实验的难度，在减少这些结果的方差方面做出了重大努力。与任何科学一样，持续复制结果的能力有效地强化了工作的基本假设，因此，应该被视为与研究本身的新颖方面一样重要。这项工作的贡献是一个能够重现一致结果的框架，并提供了一种轻松创建、训练和评估自然语言处理（NLP）深度学习（DL）模型的方法。摘要：Reproducing results in publications by distributing publicly available source code is becoming ever more popular. Given the difficulty of reproducing machine learning (ML) experiments, there have been significant efforts in reducing the variance of these results. As in any science, the ability to consistently reproduce results effectively strengthens the underlying hypothesis of the work, and thus, should be regarded as important as the novel aspect of the research itself. The contribution of this work is a framework that is able to reproduce consistent results and provides a means of easily creating, training, and evaluating natural language processing (NLP) deep learning (DL) models.

【6】 Referee: Towards reference-free cross-speaker style transfer with low-quality data for expressive speech synthesis 标题：裁判：为表达性语音合成走向低质量数据的无参照交叉语体转换链接：https://arxiv.org/abs/2109.03439

作者：Songxiang Liu,Shan Yang,Dan Su,Dong Yu 机构：Tencent AI Lab 备注：7 pages, preprint 摘要：文语转换（TTS）合成中的跨说话人风格转换（CSST）旨在将说话人风格转换为目标说话人语音中的合成语音。大多数以前的CSST方法依赖于昂贵的高质量数据，在训练过程中携带所需的口语风格，并且需要参考话语来获得口语风格描述符，作为生成新句子的条件。这项工作介绍了Credifer，一种用于表达性TTS的健壮的无参考CSST方法，它充分利用低质量数据从文本中学习说话风格。裁判员是通过将文本到样式（T2S）模型与样式到波浪（S2W）模型级联而成的。采用语音后验概率（PPG）、音素水平音高和能量轮廓作为细粒度口语风格描述符，使用T2S模型从文本中预测。采用一种新的预训练细化方法，仅使用易于获取的低质量数据来学习稳健的T2S模型。S2W模型使用高质量的目标数据进行训练，从而有效地聚合风格描述符并在目标说话人的语音中生成高保真语音。实验结果表明，在CSST中，仲裁优于基于全局样式标记（GST）的基线方法。摘要：Cross-speaker style transfer (CSST) in text-to-speech (TTS) synthesis aims at transferring a speaking style to the synthesised speech in a target speaker's voice. Most previous CSST approaches rely on expensive high-quality data carrying desired speaking style during training and require a reference utterance to obtain speaking style descriptors as conditioning on the generation of a new sentence. This work presents Referee, a robust reference-free CSST approach for expressive TTS, which fully leverages low-quality data to learn speaking styles from text. Referee is built by cascading a text-to-style (T2S) model with a style-to-wave (S2W) model. Phonetic PosteriorGram (PPG), phoneme-level pitch and energy contours are adopted as fine-grained speaking style descriptors, which are predicted from text using the T2S model. A novel pretrain-refinement method is adopted to learn a robust T2S model by only using readily accessible low-quality data. The S2W model is trained with high-quality target data, which is adopted to effectively aggregate style descriptors and generate high-fidelity speech in the target speaker's voice. Experimental results are presented, showing that Referee outperforms a global-style-token (GST)-based baseline approach in CSST.

机器翻译，仅供参考

本文参与腾讯云自媒体分享计划，分享自微信公众号。

原始发表：2021-09-09，如有侵权请联系 cloudcommunity@tencent.com 删除

linux