自然语言处理学术速递[7.20]

公众号-arXiv每日学术速递

发布于 2021-07-27 11:08:15

7310

发布于 2021-07-27 11:08:15

文章被收录于专栏：arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.CL 方向，今日共计32篇

Transformer(2篇)

【1】 Clinical Relation Extraction Using Transformer-based Models 标题：基于Transformer模型的临床关系提取

作者：Xi Yang,Zehao Yu,Yi Guo,Jiang Bian,Yonghui Wu 机构：Affiliation of the authors:, Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, Florida, Cancer Informatics and eHealth core, University of Florida Health, Cancer Center, Gainesville, Florida, USA 备注：1 Figure; 36 pages 链接：https://arxiv.org/abs/2107.08957 摘要：新兴的Transformer技术对自然语言处理的研究产生了巨大的影响。在一般英语领域，基于Transformer的模型在各种NLP基准上取得了最先进的性能。在临床领域，研究人员还研究了Transformer模型的临床应用。本研究的目的是系统地探索三种广泛使用的基于Transformer的临床关系抽取模型（即BERT、RoBERTa和XLNet），并开发一个开放源码软件包，其中包含临床预训练的基于Transformer的模型，以促进临床领域的信息抽取。我们开发了一系列基于三种Transformer架构的临床重建模型，即BERT、RoBERTa和XLNet。我们使用来自2018年MADE1.0和2018年n2c2挑战的2个公开数据集对这些模型进行了评估。我们比较了两种分类策略（二进制分类和多类分类），并研究了在不同实验环境下生成候选关系的两种方法。在这项研究中，我们比较了三种基于Transformer的关系抽取模型（BERT、RoBERTa和XLNet）。我们证明了RoBERTa临床重建模型在2018年MADE1.0数据集上取得了最佳性能，F1得分为0.8958。在2018年n2c2数据集上，XLNet临床模型的F1得分最高，为0.9610。我们的研究结果表明，二元分类策略在临床关系抽取方面的表现优于多类分类策略。我们的方法和模型在https://github.com/uf-hobi-informatics-lab/ClinicalTransformerRelationExtraction. 我们相信这项工作将改善目前在生物医学领域的临床关系抽取和其他相关自然语言处理任务的实践。摘要：The newly emerged transformer technology has a tremendous impact on NLP research. In the general English domain, transformer-based models have achieved state-of-the-art performances on various NLP benchmarks. In the clinical domain, researchers also have investigated transformer models for clinical applications. The goal of this study is to systematically explore three widely used transformer-based models (i.e., BERT, RoBERTa, and XLNet) for clinical relation extraction and develop an open-source package with clinical pre-trained transformer-based models to facilitate information extraction in the clinical domain. We developed a series of clinical RE models based on three transformer architectures, namely BERT, RoBERTa, and XLNet. We evaluated these models using 2 publicly available datasets from 2018 MADE1.0 and 2018 n2c2 challenges. We compared two classification strategies (binary vs. multi-class classification) and investigated two approaches to generate candidate relations in different experimental settings. In this study, we compared three transformer-based (BERT, RoBERTa, and XLNet) models for relation extraction. We demonstrated that the RoBERTa-clinical RE model achieved the best performance on the 2018 MADE1.0 dataset with an F1-score of 0.8958. On the 2018 n2c2 dataset, the XLNet-clinical model achieved the best F1-score of 0.9610. Our results indicated that the binary classification strategy consistently outperformed the multi-class classification strategy for clinical relation extraction. Our methods and models are publicly available at https://github.com/uf-hobi-informatics-lab/ClinicalTransformerRelationExtraction. We believe this work will improve current practice on clinical relation extraction and other related NLP tasks in the biomedical domain.

【2】 Dynamic Transformer for Efficient Machine Translation on Embedded Devices 标题：嵌入式设备上高效机器翻译的动态转换器

作者：Hishan Parry,Lei Xun,Amin Sabet,Jia Bi,Jonathon Hare,Geoff V. Merrett 机构：School of Electronics and Computer Science, University of Southampton, Southampton, UK 备注：Accepted at MLCAD 2021 链接：https://arxiv.org/abs/2107.08199 摘要：Transformer体系结构广泛应用于机器翻译任务。然而，它的资源密集性使得在受限的嵌入式设备上实现具有挑战性，特别是在可用硬件资源在运行时可能变化的情况下。我们提出了一个动态机器翻译模型，该模型基于任何特定时间的可用资源来扩展Transformer体系结构。提出的方法“动态HAT”使用HAT超级Transformer作为主干，在设计时搜索具有不同精度延迟权衡的子Transformer。最佳子Transformer在运行时从超级Transformer中采样，具体取决于延迟约束。动态HAT在Jetson Nano上进行了测试，该方法使用了直接从超级Transformer中采样的子Transformer，开关时间小于1s。使用继承的子Transformer导致BLEU分数损失<1.5%，因为采样后子Transformer配置没有从头开始重新训练。然而，为了弥补这种性能损失，可以减少设计空间的尺寸，使其适合于一系列目标硬件。新的减少的设计空间使得次优模型的BLEU分数比原来的设计空间提高了约1%，GPU的性能扩展范围在0.356s-1.526s之间，CPU的性能扩展范围在2.9s-7.31s之间。摘要：The Transformer architecture is widely used for machine translation tasks. However, its resource-intensive nature makes it challenging to implement on constrained embedded devices, particularly where available hardware resources can vary at run-time. We propose a dynamic machine translation model that scales the Transformer architecture based on the available resources at any particular time. The proposed approach, 'Dynamic-HAT', uses a HAT SuperTransformer as the backbone to search for SubTransformers with different accuracy-latency trade-offs at design time. The optimal SubTransformers are sampled from the SuperTransformer at run-time, depending on latency constraints. The Dynamic-HAT is tested on the Jetson Nano and the approach uses inherited SubTransformers sampled directly from the SuperTransformer with a switching time of <1s. Using inherited SubTransformers results in a BLEU score loss of <1.5% because the SubTransformer configuration is not retrained from scratch after sampling. However, to recover this loss in performance, the dimensions of the design space can be reduced to tailor it to a family of target hardware. The new reduced design space results in a BLEU score increase of approximately 1% for sub-optimal models from the original design space, with a wide range for performance scaling between 0.356s - 1.526s for the GPU and 2.9s - 7.31s for the CPU.

BERT(1篇)

【1】 Stock Movement Prediction with Financial News using Contextualized Embedding from BERT 标题：基于BERT上下文嵌入的金融新闻股票走势预测

作者：Qinkai Chen 机构：Ecole Polytechnique, Route de Saclay, Palaiseau, France, Exoduspoint Capital Management France, Boulevard Haussmann, Paris, France 备注：22 pages, 6 figures, 7 tables 链接：https://arxiv.org/abs/2107.08721 摘要：新闻事件可以极大地影响股票市场。在本文中，我们只利用新闻标题来预测金融新闻事件后股票价格的短期变动。为了实现这一目标，我们引入了一种新的文本挖掘方法，称为微调上下文嵌入递归神经网络（FT-CE-RNN）。与以往使用静态向量表示新闻（静态嵌入）的方法相比，我们的模型使用了由Transformer（BERT）的双向编码器表示生成的标题的上下文化向量表示（上下文化嵌入）。我们的模型得到了这个股票运动预测任务的最新结果。与其他基准模型相比，该模型在精度和交易模拟方面都有显著提高。通过对彭博新闻数百万条头条新闻的各种交易模拟，我们证明了该模型在真实场景中的能力。摘要：News events can greatly influence equity markets. In this paper, we are interested in predicting the short-term movement of stock prices after financial news events using only the headlines of the news. To achieve this goal, we introduce a new text mining method called Fine-Tuned Contextualized-Embedding Recurrent Neural Network (FT-CE-RNN). Compared with previous approaches which use static vector representations of the news (static embedding), our model uses contextualized vector representations of the headlines (contextualized embeddings) generated from Bidirectional Encoder Representations from Transformers (BERT). Our model obtains the state-of-the-art result on this stock movement prediction task. It shows significant improvement compared with other baseline models, in both accuracy and trading simulations. Through various trading simulations based on millions of headlines from Bloomberg News, we demonstrate the ability of this model in real scenarios.

机器翻译(6篇)

【1】 Simultaneous Speech Translation for Live Subtitling: from Delay to Display 标题：实时字幕同步语音翻译：从延迟到显示

作者：Alina Karakanta,Sara Papi,Matteo Negri,Marco Turchi 机构：� Fondazione Bruno Kessler, Via Sommarive , Povo, Trento, Italy, � University of Trento, Italy 备注：None 链接：https://arxiv.org/abs/2107.08807 摘要：随着交流视听化程度的提高，多语种活动中直播字幕的需求比以往任何时候都更为重要。为了使这一过程自动化，我们的目的是探索现场字幕翻译的同声传译（SimulST）的可行性。然而，对于以可理解和可读的方式显示字幕而言，SimulST系统的逐字生成速率不是最佳的。在这项工作中，我们采用SimulST系统来预测字幕中断以及翻译。然后，我们提出了一种显示模式，通过滚动行显示字幕来利用预测的中断结构。我们将我们提出的模式与1）逐字显示模式和2）分块显示模式在读取速度和延迟方面进行了比较。在三种语言对（en$\rightarrow$it，de，fr）上的实验表明，滚动行是实现可接受的读取速度的唯一模式，同时使延迟接近4秒阈值。我们认为，可阅读的实时字幕同声翻译仍然面临挑战，主要是翻译质量差，并提出了指导未来研究的方向。摘要：With the increased audiovisualisation of communication, the need for live subtitles in multilingual events is more relevant than ever. In an attempt to automatise the process, we aim at exploring the feasibility of simultaneous speech translation (SimulST) for live subtitling. However, the word-for-word rate of generation of SimulST systems is not optimal for displaying the subtitles in a comprehensible and readable way. In this work, we adapt SimulST systems to predict subtitle breaks along with the translation. We then propose a display mode that exploits the predicted break structure by presenting the subtitles in scrolling lines. We compare our proposed mode with a display 1) word-for-word and 2) in blocks, in terms of reading speed and delay. Experiments on three language pairs (en$\rightarrow$it, de, fr) show that scrolling lines is the only mode achieving an acceptable reading speed while keeping delay close to a 4-second threshold. We argue that simultaneous translation for readable live subtitles still faces challenges, the main one being poor translation quality, and propose directions for steering future research.

【2】 Integrating Unsupervised Data Generation into Self-Supervised Neural Machine Translation for Low-Resource Languages 标题：低资源语言无监督数据生成与自监督神经机器翻译的集成

作者：Dana Ruiter,Dietrich Klakow,Josef van Genabith,Cristina España-Bonet 机构：Spoken Language Systems Group, Saarland University, Germany, DFKI GmbH & Saarland University, Saarland Informatics Campus, Saarbr¨ucken, Germany, Cristina Espa˜na-Bonet, DFKI GmbH, Saarland Informatics Campus, Saarbr¨ucken, Germany 备注：11 pages, 8 figures, accepted at MT-Summit 2021 (Research Track) 链接：https://arxiv.org/abs/2107.08772 摘要：对于大多数语言组合，并行数据要么很少，要么根本不可用。为了解决这个问题，无监督机器翻译（unsupervised machine translation，UMT）通过使用反译和去噪等合成数据生成技术来利用大量的单语数据，而自监督机器翻译（self-supervised NMT，SSNMT）则在较小的可比数据中识别平行句并对其进行训练。到目前为止，还没有研究将城市轨道交通数据生成技术纳入SSNMT中。我们发现，在所有测试语言对上，将UMT技术纳入SSNMT显著优于SSNMT和UMT，在从南非荷兰语到英语的测试中，分别比SSNMT、统计UMT和混合UMT提高了+4.3 BLEU、+50.8 BLEU、+51.5。我们进一步表明，多语言去噪自动编码、SSNMT与反译和双语微调的结合使我们能够学习机器翻译，即使是对于只有少量单语数据可用的远距离语言对，例如，BLEU分数为11.6（英语到斯瓦希里语）。摘要：For most language combinations, parallel data is either scarce or simply unavailable. To address this, unsupervised machine translation (UMT) exploits large amounts of monolingual data by using synthetic data generation techniques such as back-translation and noising, while self-supervised NMT (SSNMT) identifies parallel sentences in smaller comparable data and trains on them. To date, the inclusion of UMT data generation techniques in SSNMT has not been investigated. We show that including UMT techniques into SSNMT significantly outperforms SSNMT and UMT on all tested language pairs, with improvements of up to +4.3 BLEU, +50.8 BLEU, +51.5 over SSNMT, statistical UMT and hybrid UMT, respectively, on Afrikaans to English. We further show that the combination of multilingual denoising autoencoding, SSNMT with backtranslation and bilingual finetuning enables us to learn machine translation even for distant language pairs for which only small amounts of monolingual data are available, e.g. yielding BLEU scores of 11.6 (English to Swahili).

【3】 Translatotron 2: Robust direct speech-to-speech translation 标题：Translatotron 2：健壮的直接语音到语音翻译

作者：Ye Jia,Michelle Tadmor Ramanovich,Tal Remez,Roi Pomerantz 机构：Google Research 链接：https://arxiv.org/abs/2107.08661 摘要：我们提出了translatotron2，一个可以端到端训练的神经直接语音到语音翻译模型。translatotron2由一个语音编码器、一个音素解码器、一个mel谱图合成器和一个连接前面三个组件的注意模块组成。实验结果表明，translatotron2在翻译质量和预测语音的自然度方面比原Translatotron有很大的提高，并通过减少过度生成（如胡言乱语或长停顿）显著提高了预测语音的鲁棒性。我们还提出了一种新的方法来保留源说话人的声音。经过训练的模型被限制为保留源说话人的声音，并且与原始Translatotron不同，它不能以不同说话人的声音生成语音，从而通过减少用于创建欺骗音频伪影的潜在误用，使得模型对于产品部署更为健壮。当新方法与一个简单的基于级联的数据增强结合使用时，训练好的translatotron2模型能够保留每个说话人的声音，以便在说话人转弯时输入。摘要：We present Translatotron 2, a neural direct speech-to-speech translation model that can be trained end-to-end. Translatotron 2 consists of a speech encoder, a phoneme decoder, a mel-spectrogram synthesizer, and an attention module that connects all the previous three components. Experimental results suggest that Translatotron 2 outperforms the original Translatotron by a large margin in terms of translation quality and predicted speech naturalness, and drastically improves the robustness of the predicted speech by mitigating over-generation, such as babbling or long pause. We also propose a new method for retaining the source speaker's voice in the translated speech. The trained model is restricted to retain the source speaker's voice, and unlike the original Translatotron, it is not able to generate speech in a different speaker's voice, making the model more robust for production deployment, by mitigating potential misuse for creating spoofing audio artifacts. When the new method is used together with a simple concatenation-based data augmentation, the trained Translatotron 2 model is able to retain each speaker's voice for input with speaker turns.

【4】 As Easy as 1, 2, 3: Behavioural Testing of NMT Systems for Numerical Translation 标题：简单如1、2、3：数字翻译NMT系统的行为测试

作者：Jun Wang,Chang Xu,Francisco Guzman,Ahmed El-Kishky,Benjamin I. P. Rubinstein,Trevor Cohn 机构：University of Melbourne, Australia, Facebook AI,Twitter Cortex 备注：Findings of ACL, to appear 链接：https://arxiv.org/abs/2107.08357 摘要：误译的数字有可能造成严重影响，如经济损失或医疗错误信息。在这项工作中，我们通过行为测试开发了神经机器翻译系统对数字文本的鲁棒性的综合评估。我们探讨了系统期望展示的各种数字翻译能力，并设计了有效的测试示例来揭示系统的性能不佳。我们发现数字误译是一个普遍的问题：主要的商业系统和最先进的研究模型在我们的许多测试例子中失败，对于高资源和低资源语言。据我们所知，我们的测试揭示了NMT系统中以前没有报道过的新错误。最后，我们讨论了减少数字误译的策略。摘要：Mistranslated numbers have the potential to cause serious effects, such as financial loss or medical misinformation. In this work we develop comprehensive assessments of the robustness of neural machine translation systems to numerical text via behavioural testing. We explore a variety of numerical translation capabilities a system is expected to exhibit and design effective test examples to expose system underperformance. We find that numerical mistranslation is a general issue: major commercial systems and state-of-the-art research models fail on many of our test examples, for high- and low-resource languages. Our tests reveal novel errors that have not previously been reported in NMT systems, to the best of our knowledge. Lastly, we discuss strategies to mitigate numerical mistranslation.

【5】 On the Copying Behaviors of Pre-Training for Neural Machine Translation 标题：神经机器翻译中预训练的抄袭行为研究

作者：Xuebo Liu,Longyue Wang,Derek F. Wong,Liang Ding,Lidia S. Chao,Shuming Shi,Zhaopeng Tu 机构：NLP,CT Lab, Department of Computer and Information Science, University of Macau, Tencent AI Lab ,The University of Sydney 备注：Accepted to Findings of ACL 2021 链接：https://arxiv.org/abs/2107.08212 摘要：以往的研究表明，利用预先训练的语言模型（LM）初始化神经机器翻译（NMT）模型可以加快模型训练并提高模型性能。在这项工作中，我们发现了一个关键的副作用，预训练的NMT，这是由于不一致的训练目标之间的LM为基础的预训练和NMT。由于LM目标学习重构少数源令牌并复制大部分源令牌，因此预训练初始化会影响NMT模型的复制行为。我们通过引入一个称为复制率的指标对复制行为进行了定量分析，实证结果表明，基于预训练的NMT模型比标准模型具有更大的复制率。针对这一问题，我们提出了一种简单有效的方法复制惩罚来控制解码过程中的复制行为。对域内和域外基准测试的大量实验表明，复制惩罚方法通过控制基于预训练的NMT模型的复制行为，持续地提高了翻译性能。源代码在https://github.com/SunbowLiu/CopyingPenalty. 摘要：Previous studies have shown that initializing neural machine translation (NMT) models with the pre-trained language models (LM) can speed up the model training and boost the model performance. In this work, we identify a critical side-effect of pre-training for NMT, which is due to the discrepancy between the training objectives of LM-based pre-training and NMT. Since the LM objective learns to reconstruct a few source tokens and copy most of them, the pre-training initialization would affect the copying behaviors of NMT models. We provide a quantitative analysis of copying behaviors by introducing a metric called copying ratio, which empirically shows that pre-training based NMT models have a larger copying ratio than the standard one. In response to this problem, we propose a simple and effective method named copying penalty to control the copying behaviors in decoding. Extensive experiments on both in-domain and out-of-domain benchmarks show that the copying penalty method consistently improves translation performance by controlling copying behaviors for pre-training based NMT models. Source code is freely available at https://github.com/SunbowLiu/CopyingPenalty.

【6】 Darmok and Jalad at Tanagra: A Dataset and Model for English-to-Tamarian Translation 标题：塔纳格拉的Darmok和Jalad：英语到塔玛利亚语翻译的数据集和模型

作者：Peter Jansen 机构：School of Information, University of Arizona, USA 链接：https://arxiv.org/abs/2107.08146 摘要：塔马利亚语是《星际迷航》一集《达摩克》中引入的一种虚构语言，它通过“达摩克和贾拉德在坦尼格拉”等隐喻性的话语来传达意义，而不是“我们应该一起工作”。这部作品汇集了一部塔马利亚语英语词典，收录了原集和几部后续小说中的话语，并以此为基础构建了一个包含456个英语塔马利亚语话语的平行语料库。利用这一平行语料库训练了一个基于大型语言模型（T5）的机器翻译系统，结果表明，该系统在从英语到塔马利亚语的已知语句的翻译中，准确率达到76%。摘要：Tamarian, a fictional language introduced in the Star Trek episode Darmok, communicates meaning through utterances of metaphorical references, such as "Darmok and Jalad at Tanagra" instead of "We should work together." This work assembles a Tamarian-English dictionary of utterances from the original episode and several follow-on novels, and uses this to construct a parallel corpus of 456 English-Tamarian utterances. A machine translation system based on a large language model (T5) is trained using this parallel corpus, and is shown to produce an accuracy of 76% when translating from English to Tamarian on known utterances.

Graph|知识图谱|Knowledge(2篇)

【1】 Pre-trained Language Models as Prior Knowledge for Playing Text-based Games 标题：预先训练的语言模型作为玩文本游戏的先验知识

作者：Ishika Singh,Gargi Singh,Ashutosh Modi 机构：Department of Computer Science and Engineering, Indian Institute of Technology Kanpur (IITK), India 备注：55 Pages (8 Pages main content + 2 Pages references + 45 Pages Appendix) 链接：https://arxiv.org/abs/2107.08408 摘要：最近，文本世界游戏被提出，以使人工代理了解和推理现实世界的情况。这些基于文本的游戏对人工智能体来说是一个挑战，因为它需要在部分可观察的环境中使用自然语言进行理解和交互。在本文中，我们提出了一个简单的RL-with-LM框架，其中使用了基于转换器的语言模型和深层RL模型，从而提高了对agent的语义理解。我们对我们的框架进行了详细的研究，以展示我们的模型如何在流行游戏Zork1上胜过所有现有的代理，从而获得44.7分，比最先进的模型高出1.6分。我们提出的方法在其他文本游戏上的表现也与最先进的模型相当。摘要：Recently, text world games have been proposed to enable artificial agents to understand and reason about real-world scenarios. These text-based games are challenging for artificial agents, as it requires understanding and interaction using natural language in a partially observable environment. In this paper, we improve the semantic understanding of the agent by proposing a simple RL with LM framework where we use transformer-based language models with Deep RL models. We perform a detailed study of our framework to demonstrate how our model outperforms all existing agents on the popular game, Zork1, to achieve a score of 44.7, which is 1.6 higher than the state-of-the-art model. Our proposed approach also performs comparably to the state-of-the-art models on the other set of text games.

【2】 Proactive Retrieval-based Chatbots based on Relevant Knowledge and Goals 标题：基于相关知识和目标的主动检索聊天机器人

作者：Yutao Zhu,Jian-Yun Nie,Kun Zhou,Pan Du,Hao Jiang,Zhicheng Dou 机构： Université de Montréal, Montréal, Québec, Canada, School of Information, Renmin University of China, Beijing, China, Huawei Poisson Lab., Hangzhou, Zhejiang, China, Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China 备注：Accepted by SIGIR 2021 链接：https://arxiv.org/abs/2107.08329 摘要：主动对话系统具有主动引导对话的能力。不同于一般的聊天机器人只对用户做出反应，主动对话系统可以用来实现一些目标，例如，向用户推荐一些项目。背景知识是使对话顺利自然过渡的关键。在本文中，我们提出了一个新的基于主动对话的基于检索的多任务学习框架。为了确定要使用的相关知识，我们将知识预测作为一个补充任务，并使用显式信号来监督其学习。根据预测的知识、要达到的目标和上下文选择最终的响应。实验结果表明，知识预测和目标选择的显式建模可以大大提高最终的响应选择。我们的代码在https://github.com/DaoD/KPN/. 摘要：A proactive dialogue system has the ability to proactively lead the conversation. Different from the general chatbots which only react to the user, proactive dialogue systems can be used to achieve some goals, e.g., to recommend some items to the user. Background knowledge is essential to enable smooth and natural transitions in dialogue. In this paper, we propose a new multi-task learning framework for retrieval-based knowledge-grounded proactive dialogue. To determine the relevant knowledge to be used, we frame knowledge prediction as a complementary task and use explicit signals to supervise its learning. The final response is selected according to the predicted knowledge, the goal to achieve, and the context. Experimental results show that explicit modeling of knowledge prediction and goal selection can greatly improve the final response selection. Our code is available at https://github.com/DaoD/KPN/.

摘要|信息提取(1篇)

【1】 MemSum: Extractive Summarization of Long Documents using Multi-step Episodic Markov Decision Processes 标题：MemSum：基于多步情节马尔可夫决策过程的长文档抽取摘要

作者：Nianlong Gu,Elliott Ash,Richard H. R. Hahnloser 机构：Institute of Neuroinformatics, University of Zurich and, ETH Zurich, Department of Humanities, Social and Political Sciences, Richard H.R. Hahnloser 链接：https://arxiv.org/abs/2107.08929 摘要：本文介绍了MemSum（Multi-step eposodic Markov decision process extractive summarier），它是一种基于强化学习的抽取式摘要生成器，在任意给定的时间步长上丰富了当前抽取历史的信息。与前面的模型类似，MemSum会在摘要中反复选择句子。我们的创新之处在于，在进行总结时，考虑到更广泛的信息集，人类在这项任务中也会直观地使用这些信息集：1）句子的文本内容，2）文档其余部分的全局文本上下文，以及3）包含已提取句子集的提取历史。通过轻量级体系结构，MemSum仍然可以在长文档数据集（PubMed、arXiv和GovReport）上获得最先进的测试集性能（ROUGE分数）。支持性分析表明，提取历史的附加感知使MemSum对源文档中的冗余具有鲁棒性。摘要：We introduce MemSum (Multi-step Episodic Markov decision process extractive SUMmarizer), a reinforcement-learning-based extractive summarizer enriched at any given time step with information on the current extraction history. Similar to previous models in this vein, MemSum iteratively selects sentences into the summary. Our innovation is in considering a broader information set when summarizing that would intuitively also be used by humans in this task: 1) the text content of the sentence, 2) the global text context of the rest of the document, and 3) the extraction history consisting of the set of sentences that have already been extracted. With a lightweight architecture, MemSum nonetheless obtains state-of-the-art test-set performance (ROUGE score) on long document datasets (PubMed, arXiv, and GovReport). Supporting analysis demonstrates that the added awareness of extraction history gives MemSum robustness against redundancy in the source document.

推理|分析|理解|解释(5篇)

【1】 Analysing Cyberbullying using Natural Language Processing by Understanding Jargon in Social Media 标题：基于社交媒体行话的自然语言处理分析网络欺凌

作者：Bhumika Bhatia,Anuj Verma,Anjum,Rahul Katarya 机构： Delhi Technological University, New Delhi, India- 备注：None 链接：https://arxiv.org/abs/2107.08902 摘要：今天，网络欺凌非常普遍。在线仇恨评论、毒害、儿童和其他弱势群体中的网络欺凌只会随着在线课程和社交平台的增加而增加，尤其是后COVID-19。最重要的是通过社交平台发现并确保未成年人的安全，以便自动发现任何暴力或仇恨犯罪，并对其采取严格行动。在我们的工作中，我们通过使用来自各种社交媒体平台的数据集来探索二元分类，这些数据集涵盖了广泛的网络欺凌，如性别歧视、种族主义、辱骂和仇恨言论。我们通过bilstm、GloVe等多种模型和BERT等最新模型进行了实验，并通过引入俚语滥用语料库，采用了独特的预处理技术，取得了比未进行俚语预处理的模型更高的精度。摘要：Cyberbullying is of extreme prevalence today. Online-hate comments, toxicity, cyberbullying amongst children and other vulnerable groups are only growing over online classes, and increased access to social platforms, especially post COVID-19. It is paramount to detect and ensure minors' safety across social platforms so that any violence or hate-crime is automatically detected and strict action is taken against it. In our work, we explore binary classification by using a combination of datasets from various social media platforms that cover a wide range of cyberbullying such as sexism, racism, abusive, and hate-speech. We experiment through multiple models such as Bi-LSTM, GloVe, state-of-the-art models like BERT, and apply a unique preprocessing technique by introducing a slang-abusive corpus, achieving a higher precision in comparison to models without slang preprocessing.

【2】 Beyond a binary of (non)racist tweets: A four-dimensional categorical detection and analysis of racist and xenophobic opinions on Twitter in early Covid-19 标题：超越二元(非)种族主义推文：早期冠状病毒中推特上种族主义和仇外言论的四维分类检测与分析

作者：Xin Pei,Deval Mehta 机构：School of International Communications, University of Nottingham, Ningbo, China, Independent Researcher 链接：https://arxiv.org/abs/2107.08347 摘要：本研究超越了种族主义和仇外心理文本的二元分类，以社会科学理论为线索，发展了一个种族主义和仇外心理检测的四维范畴，即污名化、攻击性、责备和排斥。借助深度学习技术，这种分类检测能够洞察Twitter上种族主义和仇外言论所反映的突发话题的细微差别。此外，还采用阶段性分析方法，捕捉了Covid-19从国内流行到国际突发公共卫生事件，再到全球大流行的早期发展阶段的主题动态变化。本研究的主要贡献包括：第一，方法论的进步。通过将最先进的计算方法与社会科学的观点联系起来，这项研究为未来的研究提供了一种有意义的方法，以便深入了解数字平台上种族主义和仇外讨论的潜在微妙之处。第二，通过更准确地理解甚至预测公众的意见和行动，这项研究为制定有效的干预政策，打击Covid-19下的种族主义犯罪和社会排斥铺平了道路。摘要：Transcending the binary categorization of racist and xenophobic texts, this research takes cues from social science theories to develop a four dimensional category for racism and xenophobia detection, namely stigmatization, offensiveness, blame, and exclusion. With the aid of deep learning techniques, this categorical detection enables insights into the nuances of emergent topics reflected in racist and xenophobic expression on Twitter. Moreover, a stage wise analysis is applied to capture the dynamic changes of the topics across the stages of early development of Covid-19 from a domestic epidemic to an international public health emergency, and later to a global pandemic. The main contributions of this research include, first the methodological advancement. By bridging the state-of-the-art computational methods with social science perspective, this research provides a meaningful approach for future research to gain insight into the underlying subtlety of racist and xenophobic discussion on digital platforms. Second, by enabling a more accurate comprehension and even prediction of public opinions and actions, this research paves the way for the enactment of effective intervention policies to combat racist crimes and social exclusion under Covid-19.

【3】 M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis 标题：M2Lens：用于情感分析的可视化和解释多模态模型

作者：Xingbo Wang,Jianben He,Zhihua Jin,Muqiao Yang,Huamin Qu 机构：C, E, A, B, D 备注：11 pages, 7 figures. This paper is accepted by IEEE VIS, 2021. To appear in IEEE Transactions on Visualization and Computer Graphics (TVCG) 链接：https://arxiv.org/abs/2107.08264 摘要：多模态情感分析的目的是从多个沟通渠道识别人们的态度，如言语内容（即文本）、声音和面部表情。它已成为自然语言处理中一个充满活力的重要研究课题。许多研究集中在不同通信信道间复杂的模式内和模式间相互作用的建模上。然而，目前具有强大性能的多模态模型通常是基于深度学习的技术，工作起来就像黑匣子。目前尚不清楚模型如何利用多模态信息进行情绪预测。尽管近年来机器学习模型的可解释性研究取得了一些进展，但它们往往以单峰情景（如图像、句子）为目标，对多峰模型的解释研究很少。在本文中，我们提出了一个交互式视觉分析系统，M2Lens，可视化和解释情感分析的多模态模型。M2Lens在全局、子集和局部水平上提供了关于模态内和模态间相互作用的解释。具体来说，它总结了三种典型的交互作用类型（即优势、互补和冲突）对模型预测的影响。此外，M2Lens识别了频繁且有影响的多模态特征，并支持从语言、声学和视觉模式对模型行为进行多方面的探索。通过两个案例研究和专家访谈，我们证明我们的系统可以帮助用户深入了解情绪分析的多模态模型。摘要：Multimodal sentiment analysis aims to recognize people's attitudes from multiple communication channels such as verbal content (i.e., text), voice, and facial expressions. It has become a vibrant and important research topic in natural language processing. Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels. However, current multimodal models with strong performance are often deep-learning-based techniques and work like black boxes. It is not clear how models utilize multimodal information for sentiment predictions. Despite recent advances in techniques for enhancing the explainability of machine learning models, they often target unimodal scenarios (e.g., images, sentences), and little research has been done on explaining multimodal models. In this paper, we present an interactive visual analytics system, M2Lens, to visualize and explain multimodal models for sentiment analysis. M2Lens provides explanations on intra- and inter-modal interactions at the global, subset, and local levels. Specifically, it summarizes the influence of three typical interaction types (i.e., dominance, complement, and conflict) on the model predictions. Moreover, M2Lens identifies frequent and influential multimodal features and supports the multi-faceted exploration of model behaviors from language, acoustic, and visual modalities. Through two case studies and expert interviews, we demonstrate our system can help users gain deep insights into the multimodal models for sentiment analysis.

【4】 The Law of Large Documents: Understanding the Structure of Legal Contracts Using Visual Cues 标题：大文献法：用视觉线索理解法律合同的结构

作者：Allison Hegel,Marina Shah,Genevieve Peaslee,Brendan Roof,Emad Elwany 机构：Lexion, Seattle, Washington, USA 备注：None 链接：https://arxiv.org/abs/2107.08128 摘要：像BERT这样的大型、预先训练的Transformer模型在文档理解任务上已经取得了最先进的结果，但是大多数实现只能一次考虑512个令牌。对于许多实际应用程序来说，文档可能要长得多，而通常用于较长文档的分段策略会遗漏文档结构和上下文信息，从而影响它们在下游任务中的结果。在我们关于法律协议的工作中，我们发现视觉提示（如文档中文本的布局、样式和位置）是很强的特征，对于在长文档中实现可接受的准确性水平至关重要。我们测量了通过计算机视觉方法获得的这些视觉线索对文档理解任务（包括文档分割、实体提取和属性分类）准确性的影响。我们基于结构元数据的文档分割方法在契约理解Atticus数据集的四个长文档理解任务上的表现优于现有方法。摘要：Large, pre-trained transformer models like BERT have achieved state-of-the-art results on document understanding tasks, but most implementations can only consider 512 tokens at a time. For many real-world applications, documents can be much longer, and the segmentation strategies typically used on longer documents miss out on document structure and contextual information, hurting their results on downstream tasks. In our work on legal agreements, we find that visual cues such as layout, style, and placement of text in a document are strong features that are crucial to achieving an acceptable level of accuracy on long documents. We measure the impact of incorporating such visual cues, obtained via computer vision methods, on the accuracy of document understanding tasks including document segmentation, entity extraction, and attribute classification. Our method of segmenting documents based on structural metadata out-performs existing methods on four long-document understanding tasks as measured on the Contract Understanding Atticus Dataset.

【5】 Architectures of Meaning, A Systematic Corpus Analysis of NLP Systems 标题：意义架构--自然语言处理系统的系统语料库分析

作者：Oskar Wysocki,Malina Florea,Donal Landers,Andre Freitas 机构：Department of Computer Science, The University of Manchester, digital Experimental Cancer Medicine Team, Cancer Biomarker Centre, CRUK Manchester Institute, University of Manchester, Idiap Research Institute 备注：20 pages, 6 figures, 9 supplementary figures, Lexicon.txt in the appendix 链接：https://arxiv.org/abs/2107.08124 摘要：本文提出了一种新的统计语料库分析框架，旨在解释自然语言处理（NLP）架构模式的规模。该方法结合了基于饱和的词库构建、统计语料库分析方法和图形搭配，从语料库中归纳出自然语言处理体系结构模式的综合表示。该框架在完整的Semeval任务语料库中得到了验证，并展示了连贯的体系结构模式，这些模式可用于以数据驱动的方式回答体系结构问题，为解释一个动态的、指数增长的领域提供了系统机制。摘要：This paper proposes a novel statistical corpus analysis framework targeted towards the interpretation of Natural Language Processing (NLP) architectural patterns at scale. The proposed approach combines saturation-based lexicon construction, statistical corpus analysis methods and graph collocations to induce a synthesis representation of NLP architectural patterns from corpora. The framework is validated in the full corpus of Semeval tasks and demonstrated coherent architectural patterns which can be used to answer architectural questions on a data-driven fashion, providing a systematic mechanism to interpret a largely dynamic and exponentially growing field.

半/弱/无监督|不确定性(2篇)

【1】 Unsupervised Identification of Relevant Prior Cases 标题：相关既往案件的无监督识别

作者：Shivangi Bithel,Sumitra S Malagi 机构：Indian Institute of Technology, Hyderabad, Telangana, India 备注：Code: this https URL 链接：https://arxiv.org/abs/2107.08973 摘要：文献检索在知识理解的几乎所有领域都发挥了作用，包括法律领域。先例，是指法院对涉及相同或者类似事实或者类似法律问题的后续案件所作的裁决。在这项工作中，我们提出了不同的无监督方法来解决识别给定查询案例的相关先例的任务。我们提出的方法是使用word2vec、doc2vec和sent2vec等词嵌入，使用TF-IDF查找余弦相似度，使用BM25分数检索相关文档，使用预先训练的模型和SBERT查找最相似的文档，并使用BM25和TF-IDF分数的乘积来查找给定查询的最相关文档。我们比较了所有的方法precision@10, recall@10，和MRR。通过对比分析，我们发现TF-IDF评分乘以BM25评分的结果最好。在本文中，我们还介绍了我们所做的分析，以提高BM25评分。摘要：Document retrieval has taken its role in almost all domains of knowledge understanding, including the legal domain. Precedent refers to a court decision that is considered as authority for deciding subsequent cases involving identical or similar facts or similar legal issues. In this work, we propose different unsupervised approaches to solve the task of identifying relevant precedents to a given query case. Our proposed approaches are using word embeddings like word2vec, doc2vec, and sent2vec, finding cosine similarity using TF-IDF, retrieving relevant documents using BM25 scores, using the pre-trained model and SBERT to find the most similar document, and using the product of BM25 and TF-IDF scores to find the most relevant document for a given query. We compared all the methods based on precision@10, recall@10, and MRR. Based on the comparative analysis, we found that the TF-IDF score multiplied by the BM25 score gives the best result. In this paper, we have also presented the analysis that we did to improve the BM25 score.

【2】 Bridging the Gap between Language Model and Reading Comprehension: Unsupervised MRC via Self-Supervision 标题：弥合语言模式与阅读理解之间的鸿沟：通过自我监督实现无监督的MRC

作者：Ning Bian,Xianpei Han,Bo Chen,Hongyu Lin,Ben He,Le Sun 机构：School of Computer Science and Technology, University of Chinese Academy of Sciences, Beijing, China, Institute of Software, Chinese Academy of Sciences, Beijing, China 链接：https://arxiv.org/abs/2107.08582 摘要：尽管最近在机器阅读理解（MRC）方面取得了成功，但是学习高质量的MRC模型仍然需要大规模的标记训练数据，甚至使用强大的预训练语言模型（plm）。PLMs的预训练任务不是问答或基于MRC的任务，使得现有PLMs不能直接用于无监督的MRC。具体来说，MRC的目标是从给定的文档中找出一个准确的答案范围，而plm的重点是在句子中填充标记。本文提出了一种新的无监督MRC框架。首先，我们提出透过自我监督学习，设计一个自我监督的藉口任务，来侦测传销资料中的答案跨度。解决这项任务需要捕获文档中句子之间的深层交互。其次，我们在推理阶段采用简单的句子改写策略来缓解问题和文档之间的表达不匹配。实验表明，该方法在无监督MRC中取得了新的性能。摘要：Despite recent success in machine reading comprehension (MRC), learning high-quality MRC models still requires large-scale labeled training data, even using strong pre-trained language models (PLMs). The pre-training tasks for PLMs are not question-answering or MRC-based tasks, making existing PLMs unable to be directly used for unsupervised MRC. Specifically, MRC aims to spot an accurate answer span from the given document, but PLMs focus on token filling in sentences. In this paper, we propose a new framework for unsupervised MRC. Firstly, we propose to learn to spot answer spans in documents via self-supervised learning, by designing a self-supervision pretext task for MRC - Spotting-MLM. Solving this task requires capturing deep interactions between sentences in documents. Secondly, we apply a simple sentence rewriting strategy in the inference stage to alleviate the expression mismatch between questions and documents. Experiments show that our method achieves a new state-of-the-art performance for unsupervised MRC.

识别/分类(2篇)

【1】 A pattern recognition approach for distinguishing between prose and poetry 标题：一种区分散文和诗歌的模式识别方法

作者：Henrique F. de Arruda,Sandro M. Reia,Filipi N. Silva,Diego R. Amancio,Luciano da F. Costa 机构：ISI Foundation, Turin, Italy, S˜ao Carlos Institute of Physics, University of S˜ao Paulo, S˜ao Carlos, Brazil, Indiana University Network Science Institute, Bloomington, Indiana , USA, Institute of Mathematics and Computer Sciences, ) 链接：https://arxiv.org/abs/2107.08512 摘要：诗歌和散文是帮助我们欣赏现实生活的书面艺术表达。每一种风格都有自己的主观属性，如韵律和节奏，很容易被人的眼睛和耳朵捕捉到。随着最近人工智能的发展，人与机器之间的差距可能已经缩小，今天我们观察到算法掌握着曾经完全由人类完成的任务。在这篇论文中，我们提出了一个自动化的方法来区分诗歌和散文的基础上，完全听觉和节奏的性质。另一方面，为了比较散文和诗歌的韵律，我们将韵律和音素表示为时间序列，因此我们提出了一种从这些序列中提取韵律特征的方法。利用所提取的特征集对所考虑的文本进行分类，得到了用神经网络获得的0.78的最佳精度。有趣的是，通过使用一种基于复杂网络的方法来可视化所考虑的不同文本之间的相似性，我们发现诗歌的模式比散文的变化要大得多。因此，更丰富和复杂的一套节奏的可能性往往被发现在这种形式。摘要：Poetry and prose are written artistic expressions that help us to appreciate the reality we live. Each of these styles has its own set of subjective properties, such as rhyme and rhythm, which are easily caught by a human reader's eye and ear. With the recent advances in artificial intelligence, the gap between humans and machines may have decreased, and today we observe algorithms mastering tasks that were once exclusively performed by humans. In this paper, we propose an automated method to distinguish between poetry and prose based solely on aural and rhythmic properties. In other to compare prose and poetry rhythms, we represent the rhymes and phones as temporal sequences and thus we propose a procedure for extracting rhythmic features from these sequences. The classification of the considered texts using the set of features extracted resulted in a best accuracy of 0.78, obtained with a neural network. Interestingly, by using an approach based on complex networks to visualize the similarities between the different texts considered, we found that the patterns of poetry vary much more than prose. Consequently, a much richer and complex set of rhythmic possibilities tends to be found in that modality.

【2】 A Comparison of Methods for OOV-word Recognition on a New Public Dataset 标题：基于新的公共数据集的OOV词识别方法比较

作者：Rudolf A. Braun,Srikanth Madikeri,Petr Motlicek 机构：Idiap Research Institute, Martigny, Switzerland 链接：https://arxiv.org/abs/2107.08091 摘要：自动语音识别系统的一个常见问题是如何识别在训练过程中看不到的单词。目前还没有成熟的方法来评估不同的技术来解决这个问题。我们建议使用CommonVoice数据集来创建多语言的测试集，这些测试集相对于训练集具有较高的词汇表外（OOV）比率，并发布一个新的工具来计算相关的性能指标。然后，在混合ASR系统的上下文中，我们评估子词模型在识别OOV方面有多好，以及通过修改wfst将OOV词信息合并到现有系统中可以获得多大的好处。此外，我们提出了一种新的方法来修改基于子词的语言模型，以便更好地识别OOV单词。我们展示了OOV单词识别的巨大改进，并提供了数据和代码。摘要：A common problem for automatic speech recognition systems is how to recognize words that they did not see during training. Currently there is no established method of evaluating different techniques for tackling this problem. We propose using the CommonVoice dataset to create test sets for multiple languages which have a high out-of-vocabulary (OOV) ratio relative to a training set and release a new tool for calculating relevant performance metrics. We then evaluate, within the context of a hybrid ASR system, how much better subword models are at recognizing OOVs, and how much benefit one can get from incorporating OOV-word information into an existing system by modifying WFSTs. Additionally, we propose a new method for modifying a subword-based language model so as to better recognize OOV-words. We showcase very large improvements in OOV-word recognition and make both the data and code available.

表征(1篇)

【1】 Learning De-identified Representations of Prosody from Raw Audio 标题：从原始音频中学习韵律的去识别表示

作者：Jack Weston,Raphael Lenain,Udeepa Meepegama,Emil Fristed 备注：None 链接：https://arxiv.org/abs/2107.08248 摘要：提出了一种利用对比自监督信号从原始音频中学习去识别韵律表示的方法。鉴于以往的工作依赖于瓶颈条件模型，我们引入了一组归纳偏差，利用韵律的自然结构来最小化音色信息，并将韵律与说话人表示解耦。尽管对输入进行了积极的下采样，并且无法获得语言信息，但我们的模型在DAMMP上的表现与最先进的语音表示相当，DAMMP是我们为口语理解引入的一个新基准。我们使用最小描述长度探测来显示我们的表示已经选择性地学习了非音色韵律的子成分，并且乘积量化器自然地将它们分离而不使用瓶颈。我们推导了语音去可识别性的信息论定义，并用它来证明我们的韵律表征比其他语音表征更难识别。摘要：We propose a method for learning de-identified prosody representations from raw audio using a contrastive self-supervised signal. Whereas prior work has relied on conditioning models on bottlenecks, we introduce a set of inductive biases that exploit the natural structure of prosody to minimize timbral information and decouple prosody from speaker representations. Despite aggressive downsampling of the input and having no access to linguistic information, our model performs comparably to state-of-the-art speech representations on DAMMP, a new benchmark we introduce for spoken language understanding. We use minimum description length probing to show that our representations have selectively learned the subcomponents of non-timbral prosody, and that the product quantizer naturally disentangles them without using bottlenecks. We derive an information-theoretic definition of speech de-identifiability and use it to demonstrate that our prosody representations are less identifiable than other speech representations.

Word2Vec|文本|单词(1篇)

【1】 Constructing Multi-Modal Dialogue Dataset by Replacing Text with Semantically Relevant Images 标题：用语义相关的图像替换文本构建多模态对话数据集

作者：Nyoungwoo Lee,Suwon Shin,Jaegul Choo,Ho-Jin Choi,Sung-Hyun Myaeng 机构：KAIST, Daejeon, South Korea 备注：Accepted by ACL 2021 链接：https://arxiv.org/abs/2107.08685 摘要：在多模态对话系统中，允许使用图像作为多轮对话的一部分是很重要的。训练这样的对话系统通常需要一个由涉及图像的多回合对话组成的大规模数据集，但这种数据集很少存在。针对这一问题，本文提出了一个45k的多模态对话数据集，该数据集只需最少的人为干预。我们建立这样一个数据集的方法包括：（1）准备和预处理文本对话数据集；（2）使用文本-图像替换技术创建图像混合对话；（3）使用基于上下文相似性的过滤步骤来确保数据集的上下文一致性。为了评估我们的数据集的有效性，我们设计了一个简单的对话句子预测任务检索模型。自动度量和人类对这些任务的评估结果表明，我们的数据集可以有效地用作多模态对话系统的训练数据，这些系统需要以上下文感知的方式理解图像和文本。我们的数据集和生成代码可在https://github.com/shh1574/multi-modal-dialogue-dataset. 摘要：In multi-modal dialogue systems, it is important to allow the use of images as part of a multi-turn conversation. Training such dialogue systems generally requires a large-scale dataset consisting of multi-turn dialogues that involve images, but such datasets rarely exist. In response, this paper proposes a 45k multi-modal dialogue dataset created with minimal human intervention. Our method to create such a dataset consists of (1) preparing and pre-processing text dialogue datasets, (2) creating image-mixed dialogues by using a text-to-image replacement technique, and (3) employing a contextual-similarity-based filtering step to ensure the contextual coherence of the dataset. To evaluate the validity of our dataset, we devise a simple retrieval model for dialogue sentence prediction tasks. Automatic metrics and human evaluation results on such tasks show that our dataset can be effectively used as training data for multi-modal dialogue systems which require an understanding of images and text in a context-aware manner. Our dataset and generation code is available at https://github.com/shh1574/multi-modal-dialogue-dataset.

其他神经网络|深度学习|模型|建模(1篇)

【1】 Continual Learning for Task-oriented Dialogue System with Iterative Network Pruning, Expanding and Masking 标题：基于迭代网络剪枝、扩展和掩蔽的任务型对话系统的持续学习

作者：Binzong Geng,Fajie Yuan,Qiancheng Xu,Ying Shen,Ruifeng Xu,Min Yang 机构：University of Science and Technology of China, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Westlake University, Georgia Institute of Technology, Sun Yat-sen University, Harbin Institute of Technology (Shenzhen) 备注：Accepted by The Annual Meeting of the Association for Computational Linguistics (ACL), 2021 链接：https://arxiv.org/abs/2107.08173 摘要：这种学习连续任务而不忘记如何执行以前训练过的问题的能力对于开发在线对话系统至关重要。该文提出了一种基于迭代网络剪枝、扩展和掩蔽（TPEM）的面向任务的对话系统的有效连续学习方法，该方法在保持先前任务的性能的同时，加快后续任务的学习进度。具体而言，TPEM（i）利用网络剪枝来保留旧任务的知识，（ii）采用网络扩展来为新任务创建自由权值，（iii）引入特定于任务的网络掩蔽来减轻旧任务固定权值对新任务的负面影响。我们对来自三个基准数据集的七个不同的任务进行了广泛的实验，并从经验上证明了TPEM比强大的竞争对手有显著的改进。为了再现性，我们将代码和数据提交至：https://github.com/siat-nlp/TPEM 摘要：This ability to learn consecutive tasks without forgetting how to perform previously trained problems is essential for developing an online dialogue system. This paper proposes an effective continual learning for the task-oriented dialogue system with iterative network pruning, expanding and masking (TPEM), which preserves performance on previously encountered tasks while accelerating learning progress on subsequent tasks. Specifically, TPEM (i) leverages network pruning to keep the knowledge for old tasks, (ii) adopts network expanding to create free weights for new tasks, and (iii) introduces task-specific network masking to alleviate the negative impact of fixed weights of old tasks on new tasks. We conduct extensive experiments on seven different tasks from three benchmark datasets and show empirically that TPEM leads to significantly improved results over the strong competitors. For reproducibility, we submit the code and data at: https://github.com/siat-nlp/TPEM

其他(8篇)

【1】 E-PDDL: A Standardized Way of Defining Epistemic Planning Problems 标题：E-PDDL：定义认知规划问题的标准化方法

作者：Francesco Fabiano,Biplav Srivastava,Jonathan Lenchner,Lior Horesh,Francesca Rossi,Marianna Bergamaschi Ganapini 备注：9 pages, Knowledge Engineering for Planning and Scheduling - ICAPS 2021 链接：https://arxiv.org/abs/2107.08739 摘要：认知规划（EP）是指agent在知识状态空间中推理，并试图从当前状态中找到一个达到期望状态的计划的一种自动规划设置。它的一般形式是多智能体认知规划（Multi-agent epostemic Planning，MEP）问题，它涉及到多个智能体，这些智能体需要对世界的状态和智能体之间的信息流进行推理。在一个MEP问题中，最近发展了多种方法，有不同的限制，例如只考虑知识的概念而不允许信念的概念，或者不允许“复杂”模态运算符，例如处理动态公共知识所需的运算符。虽然方法的多样性使人们对问题空间有了更深入的理解，但由于缺乏一种独立于解决方法来指定MEP问题的标准化方法，因此在比较计划者的绩效、确定有前途的技术、探索集成方法等新策略方面产生了困难，使新的研究人员更容易对这一研究领域作出贡献。为了解决这一问题，我们提出了一种统一的描述EP问题的方法——认知规划领域定义语言E-PDDL。我们证明了E-PPDL可以得到领先的MEP规划人员的支持，并提供了相应的解析器代码，将E-PDDL中指定的EP问题转换为（M）EP问题，这些问题可以由多个规划人员处理。这项工作也有助于建立更一般的认知规划环境，在这里我们设想了一个元认知模块，以E-PDDL中的规划问题为例，识别和评估它的一些特征，并自主决定哪个规划者是解决问题的最佳规划者。摘要：Epistemic Planning (EP) refers to an automated planning setting where the agent reasons in the space of knowledge states and tries to find a plan to reach a desirable state from the current state. Its general form, the Multi-agent Epistemic Planning (MEP) problem involves multiple agents who need to reason about both the state of the world and the information flow between agents. In a MEP problem, multiple approaches have been developed recently with varying restrictions, such as considering only the concept of knowledge while not allowing the idea of belief, or not allowing for ``complex" modal operators such as those needed to handle dynamic common knowledge. While the diversity of approaches has led to a deeper understanding of the problem space, the lack of a standardized way to specify MEP problems independently of solution approaches has created difficulties in comparing performance of planners, identifying promising techniques, exploring new strategies like ensemble methods, and making it easy for new researchers to contribute to this research area. To address the situation, we propose a unified way of specifying EP problems - the Epistemic Planning Domain Definition Language, E-PDDL. We show that E-PPDL can be supported by leading MEP planners and provide corresponding parser code that translates EP problems specified in E-PDDL into (M)EP problems that can be handled by several planners. This work is also useful in building more general epistemic planning environments where we envision a meta-cognitive module that takes a planning problem in E-PDDL, identifies and assesses some of its features, and autonomously decides which planner is the best one to solve it.

【2】 Cobordisms and commutative categorial grammars 标题：余边论与交换范畴文法

作者：Sergey Slavnov 备注：This is the final version of the previously posted series of drafts on cobordimns and categorial grammars. Concise and, hopefully, much improved presentation, but no new mathematical content compared to preceding versions. arXiv admin note: text overlap with arXiv:1911.03962 链接：https://arxiv.org/abs/2107.08728 摘要：我们提出了抽象范畴文法的一种具体的表面表示法，它是用给定的字母表中的单词修饰的二部图，推广了线性逻辑证明网。我们还介绍和研究线性逻辑文法，直接基于余基，并使用经典乘法线性逻辑作为一个类型系统。摘要：We propose a concrete surface representation of abstract categorial grammars in the category of word cobordisms or cowordisms for short, which are certain bipartite graphs decorated with words in a given alphabet, generalizing linear logic proof-nets. We also introduce and study linear logic grammars, directly based on cobordisms and using classical multiplicative linear logic as a typing system.

【3】 Human-in-the-Loop for Data Collection: a Multi-Target Counter Narrative Dataset to Fight Online Hate Speech 标题：数据收集的人在环：打击在线仇恨言论的多目标反叙事数据集

作者：Margherita Fanton,Helena Bonaldi,Serra Sinem Tekiroglu,Marco Guerini 机构：University of Trento, Italy, Fondazione Bruno Kessler, Via Sommarive , Povo, Trento, Italy 备注：To appear at ACL 2021 (long paper) 链接：https://arxiv.org/abs/2107.08720 摘要：用知情的、非攻击性的回应来削弱仇恨内容的影响，这被称为反叙述，已经成为拥有更健康的在线社区的可能解决方案。因此，一些自然语言处理研究已经开始解决反叙事生成的任务。尽管这类研究已经努力建立仇恨言语/反叙事（HS/CN）数据集用于神经生成，但它们在达到高质量和/或高数量方面都存在不足。在本文中，我们提出了一种新的人在回路数据收集方法，其中生成性语言模型通过使用来自先前回路的自己的数据来迭代地细化，以生成专家评审和/或编辑后的新训练样本。我们的实验包括几个循环，包括动态变化。结果表明，该方法是可扩展的，有利于多样化、新颖和经济高效的数据收集。据我们所知，所得到的数据集是唯一一个基于专家的多目标HS/CN数据集。摘要：Undermining the impact of hateful content with informed and non-aggressive responses, called counter narratives, has emerged as a possible solution for having healthier online communities. Thus, some NLP studies have started addressing the task of counter narrative generation. Although such studies have made an effort to build hate speech / counter narrative (HS/CN) datasets for neural generation, they fall short in reaching either high-quality and/or high-quantity. In this paper, we propose a novel human-in-the-loop data collection methodology in which a generative language model is refined iteratively by using its own data from the previous loops to generate new training samples that experts review and/or post-edit. Our experiments comprised several loops including dynamic variations. Results show that the methodology is scalable and facilitates diverse, novel, and cost-effective data collection. To our knowledge, the resulting dataset is the only expert-based multi-target HS/CN dataset available to the community.

【4】 Argument Linking: A Survey and Forecast 标题：论据链接：综述与展望

作者：William Gantt 备注：An unpublished survey 链接：https://arxiv.org/abs/2107.08523 摘要：语义角色标注（SRL）是自然语言理解（NLU）中的一项研究热点，它能识别同一句子中谓词与其他成分之间的语义关系。然而，这些关系中的许多只有在文档层次上才是明显的，因为一个句子中谓词的角色常常被另一个句子中的参数所填充。近年来，随着研究者认识到内隐语义角色标注或论元连接在信息提取和自然语言处理中的重要性，这一更为普遍的任务受到了越来越多的关注。本文综述了有关论证联系的文献，指出了现有研究方法的几个显著缺陷，指出了未来研究工作最有益的途径。摘要：Semantic role labeling (SRL) -- identifying the semantic relationships between a predicate and other constituents in the same sentence -- is a well-studied task in natural language understanding (NLU). However, many of these relationships are evident only at the level of the document, as a role for a predicate in one sentence may often be filled by an argument in a different one. This more general task, known as implicit semantic role labeling or argument linking, has received increased attention in recent years, as researchers have recognized its centrality to information extraction and NLU. This paper surveys the literature on argument linking and identifies several notable shortcomings of existing approaches that indicate the paths along which future research effort could most profitably be spent.

【5】 DeHumor: Visual Analytics for Decomposing Humor 标题：DeHumor：分解幽默的视觉分析

作者：Xingbo Wang,Yao Ming,Tongshuang Wu,Haipeng Zeng,Yong Wang,Huamin Qu 机构： Zeng is with the Sun Yat-sen University 备注：15 pages. A preprint version of a publication at IEEE Transactions on Visualization and Computer Graphics (TVCG), 2021 链接：https://arxiv.org/abs/2107.08356 摘要：尽管幽默是一种重要的沟通技巧，但掌握幽默还是很有挑战性的——幽默的成功运用需要结合引人入胜的内容构建和恰当的声音表达（如停顿）。先前对计算幽默的研究强调紧靠笑点的文本和音频特征，却忽略了长期的语境设置。此外，这些理论通常过于抽象，无法理解每一个具体的幽默片段。为了填补这一空白，我们开发了DeHumor，一个分析演讲中幽默行为的可视化分析系统。为了直观地展示每个具体例子的构建块，DeHumor将每个幽默视频分解为多模态特征，并在视频脚本中提供它们的内联注释。特别是，为了更好地捕捉这些积累，我们引入了内容重复，作为对计算幽默理论中引入的特征的补充，并将它们可视化在一个上下文链接图中。为了帮助用户找到具有所需特征的妙语，我们在一个扩展的时间矩阵上总结内容（带有关键字）和幽默特征统计。通过对单口喜剧节目和TED演讲的案例研究，我们发现DeHumor能够突出幽默例子的各个组成部分。此外，与沟通教练和幽默研究人员的专家访谈证明了DeHumor在言语内容和声音表达的多模态幽默分析中的有效性。摘要：Despite being a critical communication skill, grasping humor is challenging -- a successful use of humor requires a mixture of both engaging content build-up and an appropriate vocal delivery (e.g., pause). Prior studies on computational humor emphasize the textual and audio features immediately next to the punchline, yet overlooking longer-term context setup. Moreover, the theories are usually too abstract for understanding each concrete humor snippet. To fill in the gap, we develop DeHumor, a visual analytical system for analyzing humorous behaviors in public speaking. To intuitively reveal the building blocks of each concrete example, DeHumor decomposes each humorous video into multimodal features and provides inline annotations of them on the video script. In particular, to better capture the build-ups, we introduce content repetition as a complement to features introduced in theories of computational humor and visualize them in a context linking graph. To help users locate the punchlines that have the desired features to learn, we summarize the content (with keywords) and humor feature statistics on an augmented time matrix. With case studies on stand-up comedy shows and TED talks, we show that DeHumor is able to highlight various building blocks of humor examples. In addition, expert interviews with communication coaches and humor researchers demonstrate the effectiveness of DeHumor for multimodal humor analysis of speech content and vocal delivery.

【6】 Exploring the Potential of Lexical Paraphrases for Mitigating Noise-Induced Comprehension Errors 标题：探索词汇释义在缓解噪音引起的理解错误方面的潜力

作者：Anupama Chingacham,Vera Demberg,Dietrich Klakow 机构：Saarland Informatics Campus, Saarland University, Germany 备注：Accepted in Interspeech 2021 链接：https://arxiv.org/abs/2107.08337 摘要：即使对听力正常的人来说，在嘈杂的环境中听也很困难。语音信号可能被噪声所掩盖，这可能导致听者对词语的误解，从而导致理解信息的总体困难。为了减轻听者的听力困难，合作说话人利用Lombard语音等语音调制策略来产生噪声鲁棒性强的语音，并且已经为语音合成系统开发了类似的解决方案。在这项工作中，我们提出了另一种解决方案，选择噪声鲁棒性词汇释义来表示一个预期的意义。我们的研究结果表明，词汇释义在噪声中的可理解性是不同的。我们评估了上下文中同义词的可理解性，发现选择一个比同义词更容易被误读的词汇单元，在信噪比为-5db时，平均理解增益为37%，在信噪比为0db时，平均理解增益为21%。摘要：Listening in noisy environments can be difficult even for individuals with a normal hearing thresholds. The speech signal can be masked by noise, which may lead to word misperceptions on the side of the listener, and overall difficulty to understand the message. To mitigate hearing difficulties on listeners, a co-operative speaker utilizes voice modulation strategies like Lombard speech to generate noise-robust utterances, and similar solutions have been developed for speech synthesis systems. In this work, we propose an alternate solution of choosing noise-robust lexical paraphrases to represent an intended meaning. Our results show that lexical paraphrases differ in their intelligibility in noise. We evaluate the intelligibility of synonyms in context and find that choosing a lexical unit that is less risky to be misheard than its synonym introduced an average gain in comprehension of 37% at SNR -5 dB and 21% at SNR 0 dB for babble noise.

【7】 Generative Pretraining for Paraphrase Evaluation 标题：用于释义评估的生成性预训练

作者：Jack Weston,Raphael Lenain,Udeepa Meepegama,Emil Fristed 机构：Novoic, Raphaël Lenain 备注：Under review 链接：https://arxiv.org/abs/2107.08251 摘要：我们介绍了ParaBLEU，一个用于文本生成的释义表示学习模型和评价指标。不同于以往的方法，ParaBLEU学习理解释义使用生成条件作为一个预训练的目标。ParaBLEU与人类判断的相关性比现有指标更强，获得了2017年WMT指标共享任务的最新成果。我们证明了我们的模型对数据稀缺的鲁棒性，仅使用50\%$的可用训练数据就超过了以前最先进的性能，并且仅使用40$的标记示例就超过了BLEU、ROUGE和METEOR。最后，我们证明了ParaBLEU可以有条件地从一个演示中生成新的释义，我们用它来证实我们的假设，即它学习抽象的、广义的释义表示。摘要：We introduce ParaBLEU, a paraphrase representation learning model and evaluation metric for text generation. Unlike previous approaches, ParaBLEU learns to understand paraphrasis using generative conditioning as a pretraining objective. ParaBLEU correlates more strongly with human judgements than existing metrics, obtaining new state-of-the-art results on the 2017 WMT Metrics Shared Task. We show that our model is robust to data scarcity, exceeding previous state-of-the-art performance using only $50\%$ of the available training data and surpassing BLEU, ROUGE and METEOR with only $40$ labelled examples. Finally, we demonstrate that ParaBLEU can be used to conditionally generate novel paraphrases from a single demonstration, which we use to confirm our hypothesis that it learns abstract, generalized paraphrase representations.

【8】 Overview and Insights from the SciVer Shared Task on Scientific Claim Verification 标题：从科学主张验证的科学共享任务中获得的概述和见解

作者：David Wadden,Kyle Lo 机构：University of Washington, Allen Institute for AI 备注：SciVer shared task, presented at the 2nd Scholarly Document Processing (SDP) workshop at NAACL 2021 链接：https://arxiv.org/abs/2107.08188 摘要：我们概述了在2021年NAACL第二届学术文献处理（SDP）研讨会上提出的SciVer共享任务。在这个共享任务中，系统提供了一个科学主张和一个研究摘要语料库，并要求确定哪些文章支持或反驳这一说法，以及提供证明这些标签的证据。共有11个团队向共享任务排行榜提交了14份报告，使主要任务评估指标提高了+23 F1以上。除了调查参与系统之外，我们还提供了一些关于建模方法的见解，以支持在科学索赔验证这一重要而富有挑战性的任务上的持续进展和未来研究。摘要：We present an overview of the SciVer shared task, presented at the 2nd Scholarly Document Processing (SDP) workshop at NAACL 2021. In this shared task, systems were provided a scientific claim and a corpus of research abstracts, and asked to identify which articles SUPPORT or REFUTE the claim as well as provide evidentiary sentences justifying those labels. 11 teams made a total of 14 submissions to the shared task leaderboard, leading to an improvement of more than +23 F1 on the primary task evaluation metric. In addition to surveying the participating systems, we provide several insights into modeling approaches to support continued progress and future research on the important and challenging task of scientific claim verification.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-20，如有侵权请联系 cloudcommunity@tencent.com 删除

linux