自然语言处理学术速递[12.22]

公众号-arXiv每日学术速递

发布于 2021-12-24 08:47:25

3230

发布于 2021-12-24 08:47:25

cs.CL 方向，今日共计21篇

Transformer(1篇)

【1】 Voice Quality and Pitch Features in Transformer-Based Speech Recognition 标题：基于Transformer的语音识别中的音质和基音特征链接：https://arxiv.org/abs/2112.11391

作者：Guillermo Cámbara,Jordi Luque,Mireia Farrús 机构：TALN Research Group, Universitat Pompeu Fabra, Barcelona, Spain, Telef´onica I+D, Research, Barcelona, Spain, Language and Computation Centre, Universitat de Barcelona, Spain 备注：5 pages, 3 figures, submitted to Speech Prosody 2022 conference 摘要：抖动和微光测量已被证明是语音质量和韵律信息的载体，可提高说话人识别、二值化或自动语音识别（ASR）等任务的性能。然而，此类特征很少用于基于神经的ASR，因为在这种情况下，光谱特征通常占主导地位。在这项工作中，我们研究了将语音质量和音高特征合并到基于转换器的ASR模型中的效果，直觉表明注意机制可能利用潜在的韵律特征。为此，我们提出了韵律和频谱特征的分离卷积前端，表明这种架构选择比将此类音调和音质特征简单地连接到mel频谱图滤波器组产生更好的结果。此外，我们发现使用LibriSpeech基准测试，平均字错误率相对减少了5.6%。这些发现推动了对韵律知识应用的进一步研究，以提高基于变换的ASR的鲁棒性。摘要：Jitter and shimmer measurements have shown to be carriers of voice quality and prosodic information which enhance the performance of tasks like speaker recognition, diarization or automatic speech recognition (ASR). However, such features have been seldom used in the context of neural-based ASR, where spectral features often prevail. In this work, we study the effects of incorporating voice quality and pitch features altogether and separately to a Transformer-based ASR model, with the intuition that the attention mechanisms might exploit latent prosodic traits. For doing so, we propose separated convolutional front-ends for prosodic and spectral features, showing that this architectural choice yields better results than simple concatenation of such pitch and voice quality features to mel-spectrogram filterbanks. Furthermore, we find mean Word Error Rate relative reductions of up to 5.6% with the LibriSpeech benchmark. Such findings motivate further research on the application of prosody knowledge for increasing the robustness of Transformer-based ASR.

BERT(1篇)

【1】 DB-BERT: a Database Tuning Tool that "Reads the Manual" 链接：https://arxiv.org/abs/2112.10925

作者：Immanuel Trummer 机构：Cornell University, Ithaca, New York 摘要：DB-BERT是一种数据库调优工具，它利用通过手册和其他相关文本文档的自然语言分析获得的信息。它使用文本标识要优化的数据库系统参数以及建议的参数值。DB-BERT将预先训练好的大型语言模型（特别是BERT模型）应用于文本分析。在初始训练阶段，它会微调模型权重，以便将自然语言提示转换为推荐设置。在运行时，DB-BERT学习聚合、调整和优先排序提示，以实现特定数据库系统和基准的最佳性能。这两个阶段都是迭代的，使用强化学习来指导选择要评估的调优设置（惩罚数据库系统拒绝的设置，同时奖励提高性能的设置）。在我们的实验中，我们利用数百个关于数据库调优的文本文档作为DB-BERT的输入。考虑到不同的基准（TPC-C和TPC-H）、度量（吞吐量和运行时）以及数据库系统（Postgres和MySQL），我们将DB-BERT与不同的基线进行比较。在所有情况下，DB-BERT在所有比较方法中找到最佳参数设置。DB-BERT的代码可在线获取，网址为https://itrummer.github.io/dbbert/. 摘要：DB-BERT is a database tuning tool that exploits information gained via natural language analysis of manuals and other relevant text documents. It uses text to identify database system parameters to tune as well as recommended parameter values. DB-BERT applies large, pre-trained language models (specifically, the BERT model) for text analysis. During an initial training phase, it fine-tunes model weights in order to translate natural language hints into recommended settings. At run time, DB-BERT learns to aggregate, adapt, and prioritize hints to achieve optimal performance for a specific database system and benchmark. Both phases are iterative and use reinforcement learning to guide the selection of tuning settings to evaluate (penalizing settings that the database system rejects while rewarding settings that improve performance). In our experiments, we leverage hundreds of text documents about database tuning as input for DB-BERT. We compare DB-BERT against various baselines, considering different benchmarks (TPC-C and TPC-H), metrics (throughput and run time), as well as database systems (Postgres and MySQL). In all cases, DB-BERT finds the best parameter settings among all compared methods. The code of DB-BERT is available online at https://itrummer.github.io/dbbert/.

QA|VQA|问答|对话(2篇)

【1】 Task-oriented Dialogue Systems: performance vs. quality-optima, a review 标题：任务型对话系统：性能与质量--Optima综述链接：https://arxiv.org/abs/2112.11176

作者：Ryan Fellows,Hisham Ihshaish,Steve Battle,Ciaran Haines,Peter Mayhew,J. Ignacio Deza 机构：Computer Science Research Centre (CSRC), University of the West of England (UWE), Bristol, United Kingdom, GE Aviation, Cheltenham, United Kingdom, Universidad Atl´antida Argentina, Mar del Plata, Argentina 摘要：面向任务的对话系统（TOD）正在继续流行，因为各个行业都在寻找有效利用其能力的方法，从而节省时间和金钱。然而，即使是最先进的TOD也尚未充分发挥其潜力。TOD的主要设计重点通常是完成手头的任务，因此任务解决的度量应该优先考虑。其他可能表明对话成功或不成功的对话质量属性可能会被忽略。这可能会导致人与对话系统之间的交互，让用户感到不满意或沮丧。本文探讨了有关对话系统的评价框架和对话质量属性在对话系统中的作用的文献，考察了它们是否、如何以及在何处被使用，并考察了它们与对话系统绩效的相关性。摘要：Task-oriented dialogue systems (TODS) are continuing to rise in popularity as various industries find ways to effectively harness their capabilities, saving both time and money. However, even state-of-the-art TODS are not yet reaching their full potential. TODS typically have a primary design focus on completing the task at hand, so the metric of task-resolution should take priority. Other conversational quality attributes that may point to the success, or otherwise, of the dialogue, may be ignored. This can cause interactions between human and dialogue system that leave the user dissatisfied or frustrated. This paper explores the literature on evaluative frameworks of dialogue systems and the role of conversational quality attributes in dialogue systems, looking at if, how, and where they are utilised, and examining their correlation with the performance of the dialogue system.

【2】 An Inference Approach To Question Answering Over Knowledge Graphs 标题：一种基于知识图的问答推理方法链接：https://arxiv.org/abs/2112.11070

作者：Aayushee Gupta,K. M. Annervaz,Ambedkar Dukkipati,Shubhashis Sengupta 机构：International Institute of Information Technology, Bangalore, Annervaz K M, Indian Institute of Science & Accenture Technology Labs 备注：10 pages, 4 figures, 4 tables 摘要：知识图（KG）是从大型自然语言文本语料库中提取信息的重要工具。知识图上的自然语言查询问题对于人类对这些信息的消费至关重要。通常通过将自然语言查询转换为结构化查询，然后在KG上启动结构化查询来解决此问题。文献中关于知识图的直接回答模型很少。查询转换模型和直接模型都需要与知识图领域相关的特定训练数据。在这项工作中，我们将知识图上的自然语言查询问题转化为前提假设对上的推理问题。通过对转换后的代理推理问题使用经过训练的深度学习模型，我们为原始自然语言查询问题提供了解决方案。我们的方法在MetaQA数据集上实现了90%以上的准确率，超过了现有的最新技术。我们还提出了一个推理模型，称为分层循环路径编码器（HRPE）。推理模型可以进行微调，以便在训练数据较少的领域中使用。我们的方法不需要大量特定于领域的训练数据来查询来自不同领域的新知识图。摘要：Knowledge Graphs (KG) act as a great tool for holding distilled information from large natural language text corpora. The problem of natural language querying over knowledge graphs is essential for the human consumption of this information. This problem is typically addressed by converting the natural language query to a structured query and then firing the structured query on the KG. Direct answering models over knowledge graphs in literature are very few. The query conversion models and direct models both require specific training data pertaining to the domain of the knowledge graph. In this work, we convert the problem of natural language querying over knowledge graphs to an inference problem over premise-hypothesis pairs. Using trained deep learning models for the converted proxy inferencing problem, we provide the solution for the original natural language querying problem. Our method achieves over 90% accuracy on MetaQA dataset, beating the existing state-of-the-art. We also propose a model for inferencing called Hierarchical Recurrent Path Encoder(HRPE). The inferencing models can be fine-tuned to be used across domains with less training data. Our approach does not require large domain-specific training data for querying on new knowledge graphs from different domains.

机器翻译(1篇)

【1】 Regularizing End-to-End Speech Translation with Triangular Decomposition Agreement 标题：用三角分解协议规范端到端语音翻译链接：https://arxiv.org/abs/2112.10991

作者：Yichao Du,Zhirui Zhang,Weizhi Wang,Boxing Chen,Jun Xie,Tong Xu 机构：‡University of Science and Technology of China, Hefei, China, ♮Machine Intelligence Technology Lab, Alibaba DAMO Academy, §Rutgers University, New Brunswick, USA 备注：AAAI 2022 摘要：端到端语音到文本翻译（E2E-ST）由于其潜在的错误传播更少、延迟更低和参数更少而变得越来越流行。给定三元组训练语料库$\langle speech，translation，translation\rangle$，传统的高质量E2E-ST系统利用$\langle speech，translation\rangle$对预训练模型，然后利用$\langle speech，translation\rangle$对进一步优化模型。然而，这个过程在每个阶段只涉及两个元组数据，这种松散耦合无法充分利用三元组数据之间的关联。在本文中，我们试图建立基于语音输入的转录和翻译联合概率模型，以直接利用这类三元组数据。在此基础上，我们提出了一种新的正则化模型训练方法，以改善三重数据中双路径分解的一致性，这在理论上应该是相等的。为了实现这一目标，我们在模型训练目标中引入了两个Kullback-Leibler散度正则化项，以减少双路径输出概率之间的失配。然后，经过良好训练的模型可以通过预定义的早期停止标记自然转换为E2E-ST模型。在MuST-C基准测试上的实验表明，我们提出的方法在所有8种语言对上都显著优于最先进的E2E-ST基线，同时在自动语音识别任务中实现了更好的性能。我们的代码是开源的https://github.com/duyichao/E2E-ST-TDA. 摘要：End-to-end speech-to-text translation~(E2E-ST) is becoming increasingly popular due to the potential of its less error propagation, lower latency, and fewer parameters. Given the triplet training corpus $\langle speech, transcription, translation\rangle$, the conventional high-quality E2E-ST system leverages the $\langle speech, transcription\rangle$ pair to pre-train the model and then utilizes the $\langle speech, translation\rangle$ pair to optimize it further. However, this process only involves two-tuple data at each stage, and this loose coupling fails to fully exploit the association between triplet data. In this paper, we attempt to model the joint probability of transcription and translation based on the speech input to directly leverage such triplet data. Based on that, we propose a novel regularization method for model training to improve the agreement of dual-path decomposition within triplet data, which should be equal in theory. To achieve this goal, we introduce two Kullback-Leibler divergence regularization terms into the model training objective to reduce the mismatch between output probabilities of dual-path. Then the well-trained model can be naturally transformed as the E2E-ST models by the pre-defined early stop tag. Experiments on the MuST-C benchmark demonstrate that our proposed approach significantly outperforms state-of-the-art E2E-ST baselines on all 8 language pairs, while achieving better performance in the automatic speech recognition task. Our code is open-sourced at https://github.com/duyichao/E2E-ST-TDA.

Graph|知识图谱|Knowledge(2篇)

【1】 Controversy Detection: a Text and Graph Neural Network Based Approach 标题：争议检测：一种基于文本和图形神经网络的方法链接：https://arxiv.org/abs/2112.11445

作者：Samy Benslimane,Jérome Azé,Sandra Bringay,Maximilien Servajean,Caroline Mollevi 机构： LIRMM UMR , CNRS, University of Montpellier, Montpellier, France, AMIS, Paul Valery University, Montpellier, France, Institut du Cancer Montpellier (ICM), Montpellier, France, IDESP, UMR Inserm - Universit´e de Montpellier, Montpellier, France 摘要：有争议的内容是指吸引正面和负面反馈的任何内容。它的自动识别，特别是在社交媒体上的自动识别，是一项具有挑战性的任务，因为它应该在大量不断变化的帖子上完成，涵盖各种各样的主题。大多数现有方法依赖于主题讨论的图形结构和/或消息的内容。本文提出了一种基于讨论图结构和文本特征的争议检测方法。我们提出的方法依赖于图形神经网络（gnn）在执行图形分类任务之前将图形表示（包括其文本）编码到嵌入向量中。后者将该职位归类为有争议或无争议。提出了两种争议检测策略。第一种是基于层次图表示的学习。图用户节点分层迭代嵌入，计算整个图嵌入向量。第二种是基于注意机制，该机制允许每个用户节点在计算节点嵌入时或多或少地重视其邻居。我们使用不同的真实数据集进行实验来评估我们的方法。进行的实验表明，结合文本特征和结构信息对性能有积极影响。摘要：Controversial content refers to any content that attracts both positive and negative feedback. Its automatic identification, especially on social media, is a challenging task as it should be done on a large number of continuously evolving posts, covering a large variety of topics. Most of the existing approaches rely on the graph structure of a topic-discussion and/or the content of messages. This paper proposes a controversy detection approach based on both graph structure of a discussion and text features. Our proposed approach relies on Graph Neural Network (gnn) to encode the graph representation (including its texts) in an embedding vector before performing a graph classification task. The latter will classify the post as controversial or not. Two controversy detection strategies are proposed. The first one is based on a hierarchical graph representation learning. Graph user nodes are embedded hierarchically and iteratively to compute the whole graph embedding vector. The second one is based on the attention mechanism, which allows each user node to give more or less importance to its neighbors when computing node embeddings. We conduct experiments to evaluate our approach using different real-world datasets. Conducted experiments show the positive impact of combining textual features and structural information in terms of performance.

【2】 Supervised Graph Contrastive Pretraining for Text Classification 标题：有监督图对比预训练在文本分类中的应用链接：https://arxiv.org/abs/2112.11389

作者：Samujjwal Ghosh,Subhadeep Maji,Maunendra Sankar Desarkar 机构：IIT Hyderabad, Amazon 备注：A condensed version of this paper has been accepted to ACM SAC'22. DOI: this https URL 摘要：文本分类的对比预训练技术在无监督环境下得到了广泛的研究。但是，通常可以使用来自与当前任务共享标签语义的相关任务的标签数据。我们假设有效地使用这些标记数据可以更好地概括当前任务。在本文中，我们提出了一种基于图的监督对比学习方法，有效地利用相关任务中的标记数据。我们通过将示例中的监督信息外推到令牌来构造令牌图。我们的公式产生了一个嵌入空间，其中具有高/低属于同一类的概率的令牌彼此靠近/远离。我们还开发了详细的理论见解，作为我们方法的动机。在我们对$13$数据集的实验中，我们表明我们的方法比训练前方案平均高出$2.5\%$，并且基于示例级对比学习的公式平均高出$1.8\%$。此外，我们还显示了我们的方法在零炮设置下的跨域有效性，平均为$3.91\%$。最后，我们还证明了我们的方法可以作为知识蒸馏设置中的噪声教师，显著提高基于Transformer的模型在低标记数据区域的性能，平均提高4.57\%$。摘要：Contrastive pretraining techniques for text classification has been largely studied in an unsupervised setting. However, oftentimes labeled data from related tasks which share label semantics with current task is available. We hypothesize that using this labeled data effectively can lead to better generalization on current task. In this paper, we propose a novel way to effectively utilize labeled data from related tasks with a graph based supervised contrastive learning approach. We formulate a token-graph by extrapolating the supervised information from examples to tokens. Our formulation results in an embedding space where tokens with high/low probability of belonging to same class are near/further-away from one another. We also develop detailed theoretical insights which serve as a motivation for our method. In our experiments with $13$ datasets, we show our method outperforms pretraining schemes by $2.5\%$ and also example-level contrastive learning based formulation by $1.8\%$ on average. In addition, we show cross-domain effectiveness of our method in a zero-shot setting by $3.91\%$ on average. Lastly, we also demonstrate our method can be used as a noisy teacher in a knowledge distillation setting to significantly improve performance of transformer based models in low labeled data regime by $4.57\%$ on average.

摘要|信息提取(1篇)

【1】 Automated Drug-Related Information Extraction from French Clinical Documents: ReLyfe Approach 标题：从法国临床文献中自动提取与药物相关的信息：ReLyfe方法链接：https://arxiv.org/abs/2112.11439

作者：Azzam Alwan,Maayane Attias,Larry Rubin,Adnan El Bakri 机构：R&D Department, ReLyfe - Medical Intelligence, Paris, France, Computer Science, Ecole Polytechnique, Palaiseau, France, BeCareLink, New York, United States 备注：None 摘要：在法国，构建医疗数据仍然是一项挑战，主要是因为出于隐私考虑，缺乏医疗数据，以及缺乏处理法语的方法和途径。这些挑战之一是在法国临床文献中构建药物相关信息。据我们所知，在过去十年中，研究法国处方的相关论文不到五篇。本文提出了一种从法国临床扫描文档中提取药物相关信息的新方法，同时保护患者的隐私。此外，我们在一个健康数据管理平台中部署了我们的方法，用于构建药物医疗数据并帮助患者组织他们的药物计划。它可以在任何web或移动平台上实现。这项工作通过创建适用于实际生产问题的应用程序，缩小了理论工作和实际工作之间的差距。它是基于规则的阶段和深度学习方法的结合。最后，数值结果表明了该方法的优越性和相关性。摘要：Structuring medical data in France remains a challenge mainly because of the lack of medical data due to privacy concerns and the lack of methods and approaches on processing the French language. One of these challenges is structuring drug-related information in French clinical documents. To our knowledge, over the last decade, there are less than five relevant papers that study French prescriptions. This paper proposes a new approach for extracting drug-related information from French clinical scanned documents while preserving patients' privacy. In addition, we deployed our method in a health data management platform where it is used to structure drug medical data and help patients organize their drug schedules. It can be implemented on any web or mobile platform. This work closes the gap between theoretical and practical work by creating an application adapted to real production problems. It is a combination of a rule-based phase and a Deep Learning approach. Finally, numerical results show the outperformance and relevance of the proposed methodology.

推理|分析|理解|解释(3篇)

【1】 Scaling Language Models: Methods, Analysis & Insights from Training Gopher 标题：标度语言模型：方法、分析和来自训练地鼠的启示链接：https://arxiv.org/abs/2112.11446

作者：Jack W. Rae,Sebastian Borgeaud,Trevor Cai,Katie Millican,Jordan Hoffmann,Francis Song,John Aslanides,Sarah Henderson,Roman Ring,Susannah Young,Eliza Rutherford,Tom Hennigan,Jacob Menick,Albin Cassirer,Richard Powell,George van den Driessche,Lisa Anne Hendricks,Maribeth Rauh,Po-Sen Huang,Amelia Glaese,Johannes Welbl,Sumanth Dathathri,Saffron Huang,Jonathan Uesato,John Mellor,Irina Higgins,Antonia Creswell,Nat McAleese,Amy Wu,Erich Elsen,Siddhant Jayakumar,Elena Buchatskaya,David Budden,Esme Sutherland,Karen Simonyan,Michela Paganini,Laurent Sifre,Lena Martens,Xiang Lorraine Li,Adhiguna Kuncoro,Aida Nematzadeh,Elena Gribovskaya,Domenic Donato,Angeliki Lazaridou,Arthur Mensch,Jean-Baptiste Lespiau,Maria Tsimpoukelli,Nikolai Grigorev,Doug Fritz,Thibault Sottiaux,Mantas Pajarskas,Toby Pohlen,Zhitao Gong,Daniel Toyama,Cyprien de Masson d'Autume,Yujia Li,Tayfun Terzi,Vladimir Mikulik,Igor Babuschkin,Aidan Clark,Diego de Las Casas,Aurelia Guy,Chris Jones,James Bradbury,Matthew Johnson,Blake Hechtman,Laura Weidinger,Iason Gabriel,William Isaac,Ed Lockhart,Simon Osindero,Laura Rimell,Chris Dyer,Oriol Vinyals,Kareem Ayoub,Jeff Stanway,Lorrayne Bennett,Demis Hassabis,Koray Kavukcuoglu,Geoffrey Irving 备注：118 pages 摘要：语言建模通过利用大量人类书面知识库更好地预测和理解世界，向智能通信系统迈出了一步。在本文中，我们分析了基于Transformer的语言模型在各种模型尺度上的性能——从具有数千万个参数的模型到被称为Gopher的2800亿个参数的模型。这些模型在152项不同任务中进行评估，在大多数任务中实现了最先进的性能。在阅读理解、事实核查和有毒语言识别等领域，从量表中获得的收益最大，但逻辑和数学推理的收益较小。我们提供了对训练数据集和模型行为的整体分析，涵盖了模型规模与偏差和毒性的交叉点。最后，我们讨论了语言模型在人工智能安全和减少下游危害方面的应用。摘要：Language modelling provides a step towards intelligent communication systems by harnessing large repositories of written human knowledge to better predict and understand the world. In this paper, we present an analysis of Transformer-based language model performance across a wide range of model scales -- from models with tens of millions of parameters up to a 280 billion parameter model called Gopher. These models are evaluated on 152 diverse tasks, achieving state-of-the-art performance across the majority. Gains from scale are largest in areas such as reading comprehension, fact-checking, and the identification of toxic language, but logical and mathematical reasoning see less benefit. We provide a holistic analysis of the training dataset and model's behaviour, covering the intersection of model scale with bias and toxicity. Finally we discuss the application of language models to AI safety and the mitigation of downstream harms.

【2】 ESAN: Efficient Sentiment Analysis Network of A-Shares Research Reports for Stock Price Prediction 标题：ESAN：面向股价预测的高效A股研究报告情绪分析网络链接：https://arxiv.org/abs/2112.11444

作者：Tuo Sun,Wanrong Zheng,Shufan Yu,Mengxun Li,Jiarui Ou 摘要：在本文中，我们将开发一个自然语言处理模型来帮助我们预测股票的长期走势。整个网络包括两个模块。第一个模块是一个自然语言处理模型，它从输入报告中寻找可靠的因素。另一种是以因素为输入的时间序列预测模型，旨在预测股票收益率。为了表明我们的模型结合情绪分析模块和时间序列预测模块的效率，我们将我们的方法命名为ESAN。摘要：In this paper, we are going to develop a natural language processing model to help us to predict stocks in the long term. The whole network includes two modules. The first module is a natural language processing model which seeks out reliable factors from input reports. While the other is a time-series forecasting model which takes the factors as input and aims to predict stocks earnings yield. To indicate the efficiency of our model to combine the sentiment analysis module and the time-series forecasting module, we name our method ESAN.

【3】 NLP Techniques for Water Quality Analysis in Social Media Content 标题：社交媒体内容中水质分析的NLP技术链接：https://arxiv.org/abs/2112.11441

作者：Muhammad Asif Ayub,Khubaib Ahmad,Kashif Ahmad,Nasir Ahmad,Ala Al-Fuqaha 机构： Department of Computer Systems Engineering, University of Engineering and Technology, Peshawar, Pakistan., Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa, University, Qatar Foundation, Doha, Qatar. 备注：3 pages, 2 tables, 1 figure 摘要：本文介绍了我们对中世纪2021项工作的贡献，即“水MM：社会多媒体中的水质”。该任务旨在分析与水质相关的社交媒体帖子，特别关注水彩、气味、味道和相关疾病等方面。为此，提供了一个包含文本和视觉信息以及元数据的多模式数据集。考虑到可用内容的质量和数量，我们主要关注文本信息，分别采用三种不同的模型，并以后期融合的方式联合使用。这些模型包括（i）来自Transformer的双向编码器表示（BERT），（ii）稳健优化的BERT预训练方法（XLM-RoBERTa），以及（iii）在官方测试集上分别获得0.794、0.717和0.663 F1总分的自定义长短时记忆（LSTM）模型。在融合方案中，所有模型都被同等对待，与性能最好的单个模型相比，性能没有显著改善。摘要：This paper presents our contributions to the MediaEval 2021 task namely "WaterMM: Water Quality in Social Multimedia". The task aims at analyzing social media posts relevant to water quality with particular focus on the aspects like watercolor, smell, taste, and related illnesses. To this aim, a multimodal dataset containing both textual and visual information along with meta-data is provided. Considering the quality and quantity of available content, we mainly focus on textual information by employing three different models individually and jointly in a late-fusion manner. These models include (i) Bidirectional Encoder Representations from Transformers (BERT), (ii) Robustly Optimized BERT Pre-training Approach (XLM-RoBERTa), and a (iii) custom Long short-term memory (LSTM) model obtaining an overall F1-score of 0.794, 0.717, 0.663 on the official test set, respectively. In the fusion scheme, all the models are treated equally and no significant improvement is observed in the performance over the best performing individual model.

GAN|对抗|攻击|生成相关(1篇)

【1】 Contrast and Generation Make BART a Good Dialogue Emotion Recognizer 标题：对比与生成让BART成为良好的对话情感识别器链接：https://arxiv.org/abs/2112.11202

作者：Shimin Li,Hang Yan,Xipeng Qiu 机构： School of Computer Science, Fudan University, Peng Cheng Laboratory, Shenzhen, Guangdong, China, Shanghai Key Laboratory of Intelligent Information Processing, Fudan University 备注：Accepted by AAAI 2022 摘要：在对话系统中，语义相似的话语在不同的语境下可能有不同的情感。因此，用说话人依赖性建模长距离的上下文情感关系在对话情感识别中起着至关重要的作用。同时，区分不同的情感类别也很重要，因为它们通常具有语义上相似的情感。为此，我们采用有监督的对比学习，使不同的情绪相互排斥，从而更好地识别相似的情绪。同时，我们利用一个辅助响应生成任务来增强模型处理上下文信息的能力，从而迫使模型在不同的上下文中识别语义相似的情感。为了实现这些目标，我们使用预先训练的编码器-解码器模型BART作为主干模型，因为它非常适合理解和生成任务。在四个数据集上的实验表明，我们提出的模型在对话情感识别方面取得了比现有模型更为有利的结果。消融研究进一步证明了监督对比损失和生成性损失的有效性。摘要：In dialogue systems, utterances with similar semantics may have distinctive emotions under different contexts. Therefore, modeling long-range contextual emotional relationships with speaker dependency plays a crucial part in dialogue emotion recognition. Meanwhile, distinguishing the different emotion categories is non-trivial since they usually have semantically similar sentiments. To this end, we adopt supervised contrastive learning to make different emotions mutually exclusive to identify similar emotions better. Meanwhile, we utilize an auxiliary response generation task to enhance the model's ability of handling context information, thereby forcing the model to recognize emotions with similar semantics in diverse contexts. To achieve these objectives, we use the pre-trained encoder-decoder model BART as our backbone model since it is very suitable for both understanding and generation tasks. The experiments on four datasets demonstrate that our proposed model obtains significantly more favorable results than the state-of-the-art model in dialogue emotion recognition. The ablation study further demonstrates the effectiveness of supervised contrastive loss and generative loss.

检测相关(2篇)

【1】 Fake News Detection Tools and Methods -- A Review 标题：假新闻检测工具与方法--综述链接：https://arxiv.org/abs/2112.11185

作者：Sakshini Hangloo,Bhavna Arora 机构：Ph.D Scholar, Assistant Professor, Department of Computer Science & Information Technology, Central University of Jammu, Bagla (Rahya Suchani), District-Samba, Pin Code , Jammu, J&K, India 备注：None 摘要：在过去的十年中，社交网络平台和微博网站（如Facebook、Twitter、Instagram和微博）已经成为我们日常活动的一个组成部分，并被全球数十亿用户广泛使用，以信息、图片和视频的形式分享观点和传播信息。政府机构甚至利用这些工具通过其经验证的Facebook账户和官方推特手柄传播重要信息，因为它们可以在有限的时间窗口内接触到大量人口。然而，像宣传和谣言这样的许多欺骗活动每天都会误导用户。在这个新冠病毒时代，假新闻和谣言非常普遍，并且被大量分享，这在这个艰难时期造成了混乱。因此，在目前的情况下，对虚假新闻的检测是不可避免的。在本文中，我们调查了最近的文献中关于不同的方法来检测互联网上的假新闻。特别是，我们首先讨论了虚假新闻以及文献中所考虑的与之相关的各种术语。其次，我们重点介绍了各种公开可用的数据集和各种在线工具，它们可以实时揭穿假新闻。第三，我们描述了基于两个更广泛领域的假新闻检测方法，即其内容和社会背景。最后，我们提供了用于揭穿假新闻的各种技术的比较。摘要：In the past decade, the social networks platforms and micro-blogging sites such as Facebook, Twitter, Instagram, and Weibo have become an integral part of our day-to-day activities and is widely used all over the world by billions of users to share their views and circulate information in the form of messages, pictures, and videos. These are even used by government agencies to spread important information through their verified Facebook accounts and official Twitter handles, as they can reach a huge population within a limited time window. However, many deceptive activities like propaganda and rumor can mislead users on a daily basis. In these COVID times, fake news and rumors are very prevalent and are shared in a huge number which has created chaos in this tough time. And hence, the need for Fake News Detection in the present scenario is inevitable. In this paper, we survey the recent literature about different approaches to detect fake news over the Internet. In particular, we firstly discuss fake news and the various terms related to it that have been considered in the literature. Secondly, we highlight the various publicly available datasets and various online tools that are available and can debunk Fake News in real-time. Thirdly, we describe fake news detection methods based on two broader areas i.e., its content and the social context. Finally, we provide a comparison of various techniques that are used to debunk fake news.

【2】 Watch Those Words: Video Falsification Detection Using Word-Conditioned Facial Motion 标题：观看这些词语：使用词语条件化面部运动的视频篡改检测链接：https://arxiv.org/abs/2112.10936

作者：Shruti Agarwal,Liwen Hu,Evonne Ng,Trevor Darrell,Hao Li,Anna Rohrbach 机构：UC Berkeley, Pinscreen 摘要：在当今的数字误报时代，我们日益面临视频篡改技术带来的新威胁。此类伪造包括廉价假货（例如，长相或音频配音）和深度假货（例如，复杂的人工智能媒体合成方法），这些假货在感知上与真实视频无法区分。为了应对这一挑战，我们提出了一种多模态语义取证方法，以发现超出检测视觉质量差异的线索，从而处理更简单的廉价假货和具有视觉说服力的深度假货。在这项工作中，我们的目标是通过检测他们的面部动作和他们所说的话之间的异常对应来验证视频中所看到的所谓的人确实是他们自己。我们利用归因的概念来学习特定于个人的生物特征模式，从而将特定的说话人与其他人区分开来。我们使用可解释动作单位（AUs）来捕捉一个人的面部和头部运动，而不是深度CNN视觉特征，我们是第一个使用词条件面部运动分析的人。与现有的针对特定人的方法不同，我们的方法也能有效地防止针对嘴唇操纵的攻击。我们进一步证明了我们的方法对训练中未发现的一系列假货的有效性，包括未经视频处理的假货，这些假货在之前的工作中未得到解决。摘要：In today's era of digital misinformation, we are increasingly faced with new threats posed by video falsification techniques. Such falsifications range from cheapfakes (e.g., lookalikes or audio dubbing) to deepfakes (e.g., sophisticated AI media synthesis methods), which are becoming perceptually indistinguishable from real videos. To tackle this challenge, we propose a multi-modal semantic forensic approach to discover clues that go beyond detecting discrepancies in visual quality, thereby handling both simpler cheapfakes and visually persuasive deepfakes. In this work, our goal is to verify that the purported person seen in the video is indeed themselves by detecting anomalous correspondences between their facial movements and the words they are saying. We leverage the idea of attribution to learn person-specific biometric patterns that distinguish a given speaker from others. We use interpretable Action Units (AUs) to capture a persons' face and head movement as opposed to deep CNN visual features, and we are the first to use word-conditioned facial motion analysis. Unlike existing person-specific approaches, our method is also effective against attacks that focus on lip manipulation. We further demonstrate our method's effectiveness on a range of fakes not seen in training including those without video manipulation, that were not addressed in prior work.

识别/分类(2篇)

【1】 Mixed Precision Low-bit Quantization of Neural Network Language Models for Speech Recognition 标题：用于语音识别的神经网络语言模型的混合精度低位量化链接：https://arxiv.org/abs/2112.11438

作者：Junhao Xu,Jianwei Yu,Shoukang Hu,Xunying Liu,Helen Meng 机构： and Helen Mengare with the Chinese University of Hong Kong 摘要：以长短时记忆递归神经网络（LSTM RNN）和Transformer为代表的最新语言模型（LMs）在实际应用中变得越来越复杂和昂贵。低比特神经网络量化提供了一个强大的解决方案，以大大减少其模型的大小。目前的量化方法都是基于统一的精度，无法考虑LMs不同部分对量化误差的不同性能灵敏度。为此，本文提出了一种新的混合精度神经网络LM量化方法。使用三种技术自动学习LSTM-RNN和基于Transformer的神经LMs的最优局部精度选择。前两种方法基于量化灵敏度度量，其形式为全精度和量化LMs之间测量的KL散度，或可使用无矩阵技术有效近似的Hessian轨迹加权量化扰动。第三种方法基于混合精度神经结构搜索。为了克服梯度下降法直接估计离散量化权值的困难，采用交替方向乘子法（ADMM）有效地训练量化LMs。实验在最先进的LF-MMI CNN-TDNN系统上进行，该系统具有速度扰动、i向量和基于学习隐藏单元贡献（LHUC）的说话人自适应两个任务：交换机电话语音和AMI会议转录。所提出的混合精度量化技术通过在全精度LSTM和Transformer基线LMs上产生高达约16倍的模型大小压缩比，在这两项任务上实现了“无损”量化，同时不会导致统计上显著的字错误率增加。摘要：State-of-the-art language models (LMs) represented by long-short term memory recurrent neural networks (LSTM-RNNs) and Transformers are becoming increasingly complex and expensive for practical applications. Low-bit neural network quantization provides a powerful solution to dramatically reduce their model size. Current quantization methods are based on uniform precision and fail to account for the varying performance sensitivity at different parts of LMs to quantization errors. To this end, novel mixed precision neural network LM quantization methods are proposed in this paper. The optimal local precision choices for LSTM-RNN and Transformer based neural LMs are automatically learned using three techniques. The first two approaches are based on quantization sensitivity metrics in the form of either the KL-divergence measured between full precision and quantized LMs, or Hessian trace weighted quantization perturbation that can be approximated efficiently using matrix free techniques. The third approach is based on mixed precision neural architecture search. In order to overcome the difficulty in using gradient descent methods to directly estimate discrete quantized weights, alternating direction methods of multipliers (ADMM) are used to efficiently train quantized LMs. Experiments were conducted on state-of-the-art LF-MMI CNN-TDNN systems featuring speed perturbation, i-Vector and learning hidden unit contribution (LHUC) based speaker adaptation on two tasks: Switchboard telephone speech and AMI meeting transcription. The proposed mixed precision quantization techniques achieved "lossless" quantization on both tasks, by producing model size compression ratios of up to approximately 16 times over the full precision LSTM and Transformer baseline LMs, while incurring no statistically significant word error rate increase.

【2】 Predicting Job Titles from Job Descriptions with Multi-label Text Classification 标题：基于多标签文本分类的职位描述预测链接：https://arxiv.org/abs/2112.11052

作者：Hieu Trung Tran,Hanh Hong Phuc Vo,Son T. Luu 机构：University of Information Technology, Vietnam National University, Ho Chi Minh City, Vietnam 备注：Accpeted by the NAFOSTED Conference on Information and Computer Science (NICS 2021) 摘要：找一份合适的工作和寻找合格的候选人对求职和人力资源机构都很重要。由于有大量关于职位描述的信息，员工和雇主需要帮助根据职位描述文本自动检测职位。在本文中，我们提出了从工作描述文本中预测相关职位的多标签分类方法，并使用不同的预训练语言模型实现了Bi GRU LSTM CNN以应用于职位预测问题。具有多语言预训练模型的BERT在开发集和测试集上的F1得分最高，开发集和测试集的得分分别为62.20%和47.44%。摘要：Finding a suitable job and hunting for eligible candidates are important to job seeking and human resource agencies. With the vast information about job descriptions, employees and employers need assistance to automatically detect job titles based on job description texts. In this paper, we propose the multi-label classification approach for predicting relevant job titles from job description texts, and implement the Bi-GRU-LSTM-CNN with different pre-trained language models to apply for the job titles prediction problem. The BERT with multilingual pre-trained model obtains the highest result by F1-scores on both development and test sets, which are 62.20% on the development set, and 47.44% on the test set.

检索(1篇)

【1】 On Cross-Lingual Retrieval with Multilingual Text Encoders 标题：基于多语种文本编码器的跨语言检索研究链接：https://arxiv.org/abs/2112.11031

作者：Robert Litschko,Ivan Vulić,Simone Paolo Ponzetto,Goran Glavaš 机构：Ponzetto ¨ Goran Glavaˇs, Received: date Accepted: date 备注：to appear in IRJ ECIR 2021 Special Issue. arXiv admin note: substantial text overlap with arXiv:2101.08370 摘要：在这项工作中，我们提出了一项系统的实证研究，重点是最先进的多语言编码器对于跨多种语言对的跨语言文档和句子检索任务的适用性。我们首先将这些模型视为多语言文本编码器，并在无监督的特设句子和文档级CLIR中测试它们的性能。与有监督的语言理解相反，我们的结果表明，对于无监督的文档级CLIR（一种没有针对特定IR微调的相关性判断的设置），预训练的多语言编码器平均无法显著优于基于CLWEs的早期模型。对于句子级检索，我们确实获得了最先进的性能：但是，多语言编码器达到了峰值分数，这些编码器以有监督的方式进一步专门用于句子理解任务，而不是使用普通的“现成”变体。根据这些结果，我们为文档级CLIR引入了本地化的相关性匹配，其中我们独立地对文档部分的查询进行评分。在第二部分中，我们在一系列Zero-Shot语言和领域转移CLIR实验中评估了以监督方式（即，我们学习排名）对英语相关数据进行微调的多语言编码器。我们的研究结果表明，有监督的重新排名很少能提高多语言Transformer作为无监督基础排名的性能。最后，只有通过域内对比微调（即，同一域，只有语言迁移），我们才能提高排名质量。我们揭示了目标语言单语检索的跨语言检索结果和（零炮）跨语言迁移结果之间的实质性经验差异，这表明基于单语数据训练的检索模型“单语过度拟合”。摘要：In this work we present a systematic empirical study focused on the suitability of the state-of-the-art multilingual encoders for cross-lingual document and sentence retrieval tasks across a number of diverse language pairs. We first treat these models as multilingual text encoders and benchmark their performance in unsupervised ad-hoc sentence- and document-level CLIR. In contrast to supervised language understanding, our results indicate that for unsupervised document-level CLIR -- a setup with no relevance judgments for IR-specific fine-tuning -- pretrained multilingual encoders on average fail to significantly outperform earlier models based on CLWEs. For sentence-level retrieval, we do obtain state-of-the-art performance: the peak scores, however, are met by multilingual encoders that have been further specialized, in a supervised fashion, for sentence understanding tasks, rather than using their vanilla 'off-the-shelf' variants. Following these results, we introduce localized relevance matching for document-level CLIR, where we independently score a query against document sections. In the second part, we evaluate multilingual encoders fine-tuned in a supervised fashion (i.e., we learn to rank) on English relevance data in a series of zero-shot language and domain transfer CLIR experiments. Our results show that supervised re-ranking rarely improves the performance of multilingual transformers as unsupervised base rankers. Finally, only with in-domain contrastive fine-tuning (i.e., same domain, only language transfer), we manage to improve the ranking quality. We uncover substantial empirical differences between cross-lingual retrieval results and results of (zero-shot) cross-lingual transfer for monolingual retrieval in target languages, which point to "monolingual overfitting" of retrieval models trained on monolingual data.

Word2Vec|文本|单词(1篇)

【1】 An ASP-based Approach to Answering Natural Language Questions for Texts 标题：一种基于ASP的自然语言文本答疑方法链接：https://arxiv.org/abs/2112.11241

作者：Dhruva Pendharkar,Kinjal Basu,Farhad Shakerin,Gopal Gupta 机构：The University of Texas at Dallas 摘要：提出了一种基于答案集编程（ASP）的自然语言文本知识表示方法。文本中的知识使用新戴维森式的形式主义建模，然后将其表示为答案集程序。相关常识知识从WordNet等资源中额外导入，并在ASP中表示。生成的知识库可以在ASP系统的帮助下用于执行推理。这种方法可以促进许多自然语言任务，如自动问答、文本摘要和自动问题生成。基于ASP的技术表示，如默认推理、分层知识组织、默认偏好等，用于为完成这些任务所需的常识推理方法建模。在本文中，我们描述了我们开发的CASPR系统，该系统用于自动回答给定英语文本的自然语言问题。CASPR可以被视为一个通过“理解”文本来回答问题的系统，并且已经在球队数据集上进行了测试，结果令人鼓舞。摘要：An approach based on answer set programming (ASP) is proposed in this paper for representing knowledge generated from natural language texts. Knowledge in a text is modeled using a Neo Davidsonian-like formalism, which is then represented as an answer set program. Relevant commonsense knowledge is additionally imported from resources such as WordNet and represented in ASP. The resulting knowledge-base can then be used to perform reasoning with the help of an ASP system. This approach can facilitate many natural language tasks such as automated question answering, text summarization, and automated question generation. ASP-based representation of techniques such as default reasoning, hierarchical knowledge organization, preferences over defaults, etc., are used to model commonsense reasoning methods required to accomplish these tasks. In this paper, we describe the CASPR system that we have developed to automate the task of answering natural language questions given English text. CASPR can be regarded as a system that answers questions by "understanding" the text and has been tested on the SQuAD data set, with promising results.

其他(3篇)

【1】 Deliberation of Streaming RNN-Transducer by Non-autoregressive Decoding 标题：采用非自回归译码的流式RNN传感器的商榷链接：https://arxiv.org/abs/2112.11442

作者：Weiran Wang,Ke Hu,Tara Sainath 机构：Tara N. Sainath, Google, Inc. 摘要：我们建议考虑流式RNN-T模型与先前提出的Align-Refine非自回归解码方法及其改进版本的假设一致性。该方法执行几个细化步骤，其中每个步骤共享一个转换器解码器，该解码器同时关注文本特征（从对齐中提取）和音频特征，并输出完整的更新对齐。transformer解码器使用CTC损失进行训练，这有助于并行贪婪解码，并执行完全上下文注意以捕获标签依赖。我们通过引入级联编码器（在细化之前捕获更多音频上下文）和对齐增强（强制学习标签依赖）来改进对齐细化。我们表明，在流式RNN-T模型的假设对齐的条件下，我们的方法比第一次通过的RNN-T获得了更准确的识别结果，并且只有少量的模型参数。摘要：We propose to deliberate the hypothesis alignment of a streaming RNN-T model with the previously proposed Align-Refine non-autoregressive decoding method and its improved versions. The method performs a few refinement steps, where each step shares a transformer decoder that attends to both text features (extracted from alignments) and audio features, and outputs complete updated alignments. The transformer decoder is trained with the CTC loss which facilitates parallel greedy decoding, and performs full-context attention to capture label dependencies. We improve Align-Refine by introducing cascaded encoder that captures more audio context before refinement, and alignment augmentation which enforces learning label dependency. We show that, conditioned on hypothesis alignments of a streaming RNN-T model, our method obtains significantly more accurate recognition results than the first-pass RNN-T, with only small amount of model parameters.

【2】 Lyric document embeddings for music tagging 标题：用于音乐标记的歌词文档嵌入链接：https://arxiv.org/abs/2112.11436

作者：Matt McVicar,Bruno Di Giorgi,Baris Dundar,Matthias Mauch 机构：Apple 摘要：我们提出了一个实证研究嵌入一首歌曲的歌词到一个固定的维度特征的目的是音乐标签。五种计算令牌级别的方法和四种计算文档级别表示的方法在一个数千万首歌曲的工业规模数据集上进行了训练。我们将预训练嵌入的简单平均值与现代的递归和基于注意的神经结构进行比较。通过对广泛的标记任务（如体裁分类、明确内容识别和era检测）的评估，我们发现，在许多下游指标中，平均单词嵌入优于更复杂的体系结构。摘要：We present an empirical study on embedding the lyrics of a song into a fixed-dimensional feature for the purpose of music tagging. Five methods of computing token-level and four methods of computing document-level representations are trained on an industrial-scale dataset of tens of millions of songs. We compare simple averaging of pretrained embeddings to modern recurrent and attention-based neural architectures. Evaluating on a wide range of tagging tasks such as genre classification, explicit content identification and era detection, we find that averaging word embeddings outperform more complex architectures in many downstream metrics.

【3】 How are cities pledging net zero? A computational approach to analyzing subnational climate strategies 标题：城市是如何承诺净零的呢？一种分析国家以下气候战略的计算方法链接：https://arxiv.org/abs/2112.11207

作者：Siddharth Sachdeva,Angel Hsu,Ian French,Elwin Lim 机构： Data-Driven EnviroLab, University of North Carolina-Chapel Hill, S. Columbia Street, Chapel Hill, NC , Department University of North Carolina-Chapel Hill, S. Columbia Street, Chapel Hill, NC, Yale-NUS College, College Ave W, Singapore 备注：14 pages, 6 figures, submitted to nature urban sustainability 摘要：城市已经成为气候变化的主要参与者，并且越来越多地制定旨在实现净零排放的目标。国家以下各级政府“竞相实现零排放”并阐明自己的气候缓解计划的迅速扩散，需要进行更仔细的检查，以了解这些行为者打算如何实现这些目标。然而，城市气候政策文件的分散性、不完整性和异质性使得其系统分析具有挑战性。我们使用基于机器学习的自然语言处理（NLP）技术分析了318份气候行动文件，这些文件来自承诺实现净零目标的城市或加入了一项跨国气候倡议。我们使用这些方法来实现两个主要目标：1）确定预测“雄心勃勃”净零排放目标的文本模式，其中我们将雄心勃勃的目标定义为包含国家以下各级政府经济范围排放量的目标；2）进行部门分析，以确定气候行动主题（即土地利用、工业、建筑等）的模式和权衡。我们发现，定义了雄心勃勃的气候行动的城市往往在其计划中强调量化指标和特定的高排放部门，并提到治理和公民参与。城市在其计划中主要强调与能源有关的行动，特别是在建筑、运输和供暖部门，但往往以牺牲其他部门为代价，包括土地使用和气候影响。本文提出的方法为分析气候行动计划提供了一种可复制、可扩展的方法，也是促进跨城市学习的第一步。摘要：Cities have become primary actors on climate change and are increasingly setting goals aimed at net-zero emissions. The rapid proliferation of subnational governments "racing to zero" emissions and articulating their own climate mitigation plans warrants closer examination to understand how these actors intend to meet these goals. The scattered, incomplete and heterogeneous nature of city climate policy documents, however, has made their systemic analysis challenging. We analyze 318 climate action documents from cities that have pledged net-zero targets or joined a transnational climate initiative with this goal using machine learning-based natural language processing (NLP) techniques. We use these approaches to accomplish two primary goals: 1) determine text patterns that predict "ambitious" net-zero targets, where we define an ambitious target as one that encompasses a subnational government's economy-wide emissions; and 2) perform a sectoral analysis to identify patterns and trade-offs in climate action themes (i.e., land-use, industry, buildings, etc.). We find that cities that have defined ambitious climate actions tend to emphasize quantitative metrics and specific high-emitting sectors in their plans, supported by mentions of governance and citizen participation. Cities predominantly emphasize energy-related actions in their plans, particularly in the buildings, transport and heating sectors, but often at the expense of other sectors, including land-use and climate impacts. The method presented in this paper provides a replicable, scalable approach to analyzing climate action plans and a first step towards facilitating cross-city learning.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-12-22，如有侵权请联系 cloudcommunity@tencent.com 删除

linux