自然语言处理学术速递[12.10]

公众号-arXiv每日学术速递

发布于 2021-12-10 17:08:16

2630

发布于 2021-12-10 17:08:16

文章被收录于专栏：arXiv每日学术速递

Update！H5支持摘要折叠，体验更佳！涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏等功能！

cs.CL 方向，今日共计17篇

Transformer(3篇)

【1】 Transferring BERT-like Transformers' Knowledge for Authorship Verification 标题：利用类BERTTransformer知识进行作者认证链接：https://arxiv.org/abs/2112.05125

作者：Andrei Manolache,Florin Brad,Elena Burceanu,Antonio Barbalau,Radu Ionescu,Marius Popescu 机构： Bitdefender, University of Bucharest 备注：16 pages, 3 figures 摘要：识别文本作者的任务跨越了几十年，并通过语言学、统计学以及最近的机器学习来解决。受广泛的自然语言处理任务中令人印象深刻的性能提升以及泛大规模作者身份数据集的最近可用性的启发，我们首先研究了几种类似于BERT的转换器在作者身份验证任务中的有效性。事实证明，这样的模型能够始终如一地获得很高的分数。接下来，我们的经验表明，他们关注的是主题线索，而不是作者的写作风格特征，利用了数据集中现有的偏见。为了解决这个问题，我们为PAN-2020提供了新的拆分，其中训练和测试数据来自不相交的主题或作者。最后，我们介绍了DarkReddit，一个具有不同输入数据分布的数据集。我们进一步使用它来分析低数据状态下模型的领域泛化性能，以及使用建议的PAN-2020拆分进行微调时性能如何变化。我们表明，这些拆分可以增强模型在新的、显著不同的数据集上传递知识的能力。摘要：The task of identifying the author of a text spans several decades and was tackled using linguistics, statistics, and, more recently, machine learning. Inspired by the impressive performance gains across a broad range of natural language processing tasks and by the recent availability of the PAN large-scale authorship dataset, we first study the effectiveness of several BERT-like transformers for the task of authorship verification. Such models prove to achieve very high scores consistently. Next, we empirically show that they focus on topical clues rather than on author writing style characteristics, taking advantage of existing biases in the dataset. To address this problem, we provide new splits for PAN-2020, where training and test data are sampled from disjoint topics or authors. Finally, we introduce DarkReddit, a dataset with a different input data distribution. We further use it to analyze the domain generalization performance of models in a low-data regime and how performance varies when using the proposed PAN-2020 splits for fine-tuning. We show that those splits can enhance the models' capability to transfer knowledge over a new, significantly different dataset.

【2】 Opinion Extraction as A Structured Sentiment Analysis using Transformers 标题：基于变形器的观点抽取作为结构化情感分析链接：https://arxiv.org/abs/2112.05056

作者：Yucheng Liu,Tian Zhu 机构：W ,: Natural Language Processing, UC Berkeley School of Information 摘要：关系提取和命名实体识别一直被认为是两个不同的任务，需要不同的输入数据、标签和模型。然而，这两种方法对于结构化情绪分析都是必不可少的。我们相信这两个任务可以组合成一个具有相同输入数据的单一堆叠模型。我们进行了不同的实验，以找到从一个句子中提取多个意见元组的最佳模型。意见元组将由持有者、目标和表达式组成。使用意见元组，我们将能够提取所需的关系。摘要：Relationship extraction and named entity recognition have always been considered as two distinct tasks that require different input data, labels, and models. However, both are essential for structured sentiment analysis. We believe that both tasks can be combined into a single stacked model with the same input data. We performed different experiments to find the best model to extract multiple opinion tuples from a single sentence. The opinion tuples will consist of holders, targets, and expressions. With the opinion tuples, we will be able to extract the relationship we need.

【3】 A Bilingual, OpenWorld Video Text Dataset and End-to-end Video Text Spotter with Transformer 标题：带有Transformer的双语OpenWorld视频文本数据集和端到端视频文本筛选链接：https://arxiv.org/abs/2112.04888

作者：Weijia Wu,Yuanqiang Cai,Debing Zhang,Sibo Wang,Zhuang Li,Jiahong Li,Yejun Tang,Hong Zhou 机构：Zhejiang University, Beijing University of Posts and Telecommunications, Kuaishou Technology 备注：None 摘要：大多数现有的视频文本识别基准都侧重于评估数据有限的单一语言和场景。在这项工作中，我们介绍了一个大规模的、双语的、开放世界的视频文本基准数据集（BOVText）。BOVText有四个功能。首先，我们提供了2000多个视频，超过1750000+帧，比现有最大的视频附带文本数据集大25倍。其次，我们的数据集涵盖了30多个开放类别，有各种各样的场景可供选择，例如生活视频日志、驾驶、电影等。第三，为视频中的不同代表意义提供了丰富的文本类型注释（即标题、标题或场景文本）。第四，BOVText提供双语文本注释，促进多元文化的生活和交流。此外，我们还提出了一种带有转换器的端到端视频文本定位框架TransVTSpotter，它通过一种简单而高效的基于注意的查询密钥机制解决了视频中的多方向文本定位问题。它将上一帧中的对象特征应用为当前帧的跟踪查询，并引入旋转角度预测以适合多方向文本实例。在ICDAR2015（视频）上，TransVTSpotter以44.1%的MOTA，9 fps的速度实现了最先进的性能。TransVTSpotter的数据集和代码分别位于github:com=weijiawu=BOVText和github:com=weijiawu=TransVTSpotter。摘要：Most existing video text spotting benchmarks focus on evaluating a single language and scenario with limited data. In this work, we introduce a large-scale, Bilingual, Open World Video text benchmark dataset(BOVText). There are four features for BOVText. Firstly, we provide 2,000+ videos with more than 1,750,000+ frames, 25 times larger than the existing largest dataset with incidental text in videos. Secondly, our dataset covers 30+ open categories with a wide selection of various scenarios, e.g., Life Vlog, Driving, Movie, etc. Thirdly, abundant text types annotation (i.e., title, caption or scene text) are provided for the different representational meanings in video. Fourthly, the BOVText provides bilingual text annotation to promote multiple cultures live and communication. Besides, we propose an end-to-end video text spotting framework with Transformer, termed TransVTSpotter, which solves the multi-orient text spotting in video with a simple, but efficient attention-based query-key mechanism. It applies object features from the previous frame as a tracking query for the current frame and introduces a rotation angle prediction to fit the multiorient text instance. On ICDAR2015(video), TransVTSpotter achieves the state-of-the-art performance with 44.1% MOTA, 9 fps. The dataset and code of TransVTSpotter can be found at github:com=weijiawu=BOVText and github:com=weijiawu=TransVTSpotter, respectively.

语义分析(2篇)

【1】 Semantic Search as Extractive Paraphrase Span Detection 标题：语义搜索作为提取转述跨度检测的一种方法链接：https://arxiv.org/abs/2112.04886

作者：Jenna Kanerva,Hanna Kitti,Li-Hsin Chang,Teemu Vahtola,Mathias Creutz,Filip Ginter 机构： TurkuNLP, Department of Computing, University of Turku, Finland, Department of Digital Humanities, University of Helsinki, Finland 摘要：在本文中，我们通过将搜索任务框架化为释义范围检测来处理语义搜索问题，即给定一段文本作为查询短语，任务是在给定文档中识别其释义，这与提取式问答中通常使用的建模设置相同。在Turku释义语料库中，包括原始文档上下文在内的100000个手动提取的芬兰释义对，我们发现我们的释义广度检测模型在精确匹配方面比两个强检索基线（词汇相似性和BERT句子嵌入）分别高出31.9pp和22.4pp，就代币级F分数而言，分别为22.3pp和12.9pp。这表明了在跨度检索而不是句子相似性方面对任务建模的强大优势。此外，我们还介绍了一种通过反向翻译创建人工释义数据的方法，适用于无法使用人工注释的释义资源来训练跨度检测模型的语言。摘要：In this paper, we approach the problem of semantic search by framing the search task as paraphrase span detection, i.e. given a segment of text as a query phrase, the task is to identify its paraphrase in a given document, the same modelling setup as typically used in extractive question answering. On the Turku Paraphrase Corpus of 100,000 manually extracted Finnish paraphrase pairs including their original document context, we find that our paraphrase span detection model outperforms two strong retrieval baselines (lexical similarity and BERT sentence embeddings) by 31.9pp and 22.4pp respectively in terms of exact match, and by 22.3pp and 12.9pp in terms of token-level F-score. This demonstrates a strong advantage of modelling the task in terms of span retrieval, rather than sentence similarity. Additionally, we introduce a method for creating artificial paraphrase data through back-translation, suitable for languages where manually annotated paraphrase resources for training the span detection model are not available.

【2】 Prompt-based Zero-shot Relation Classification with Semantic Knowledge Augmentation 标题：基于提示的语义知识增强的零概率关系分类链接：https://arxiv.org/abs/2112.04539

作者：Jiaying Gong,Hoda Eldardiry 机构：Virginia Tech, Blacksburg, U.S. 备注：11 pages, 7 figures

Graph|知识图谱|Knowledge(2篇)

【1】 KGE-CL: Contrastive Learning of Knowledge Graph Embeddings 标题：KGE-CL：知识图嵌入的对比学习链接：https://arxiv.org/abs/2112.04871

作者：Wentao Xu,Zhiping Luo,Weiqing Liu,Jiang Bian,Jian Yin,Tie-Yan Liu 机构：School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, China, Microsoft Research Asia, Beijing, China, School of Artificial Intelligence, Sun Yat-sen University, Zhuhai, China 摘要：学习知识图的嵌入在人工智能中是至关重要的，并且可以受益于各种下游应用，例如推荐和问答。近年来，人们对知识图嵌入进行了大量的研究。然而，以往的大多数知识图嵌入方法都忽略了不同三元组中相关实体和实体-关系对之间的语义相似性，因为它们分别使用评分函数对每个三元组进行优化。针对这一问题，我们提出了一个简单而有效的知识图嵌入对比学习框架，该框架可以缩短不同三元组中相关实体和实体-关系对的语义距离，从而提高知识图嵌入的表达能力。我们在三个标准的知识图基准上评估了我们提出的方法。值得注意的是，我们的方法可以产生一些新的最先进的结果，实现51.2%的MRR，46.8%Hits@1在WN18RR数据集上，和59.1%的MRR，51.8%Hits@1在YAGO3-10数据集上。摘要：Learning the embeddings of knowledge graphs is vital in artificial intelligence, and can benefit various downstream applications, such as recommendation and question answering. In recent years, many research efforts have been proposed for knowledge graph embedding. However, most previous knowledge graph embedding methods ignore the semantic similarity between the related entities and entity-relation couples in different triples since they separately optimize each triple with the scoring function. To address this problem, we propose a simple yet efficient contrastive learning framework for knowledge graph embeddings, which can shorten the semantic distance of the related entities and entity-relation couples in different triples and thus improve the expressiveness of knowledge graph embeddings. We evaluate our proposed method on three standard knowledge graph benchmarks. It is noteworthy that our method can yield some new state-of-the-art results, achieving 51.2% MRR, 46.8% Hits@1 on the WN18RR dataset, and 59.1% MRR, 51.8% Hits@1 on the YAGO3-10 dataset.

【2】 Refined Commonsense Knowledge from Large-Scale Web Contents 标题：从大规模Web内容中提炼常识链接：https://arxiv.org/abs/2112.04596

作者：Tuan-Phong Nguyen,Simon Razniewski,Julien Romero,Gerhard Weikum 备注：This is a substantial extension of the WWW paper (arXiv:2011.00905). arXiv admin note: substantial text overlap with arXiv:2011.00905 摘要：关于概念及其属性的常识知识（CSK）对人工智能应用非常有用。以前的作品，如ConceptNet、COMET和其他人编译了大型CSK集合，但其表达能力仅限于主谓宾（SPO）三元组，其中S的概念简单，P和O的字符串简单。本文介绍了一种称为ASCENT++的方法，用于自动构建CSK断言的大规模知识库（KB），与以前的作品相比，具有精致的表现力和更好的精确度和召回率。ASCENT++通过捕获具有子组和方面的复合概念，并通过使用语义方面细化断言，超越了SPO三元组。后者对于表达断言和进一步限定词的时间和空间有效性很重要。ASCENT++将开放信息提取与明智的清理结合起来，并根据典型性和显著性得分进行排序。为了获得高覆盖率，我们的方法利用了具有广泛web内容的大规模爬网。人类判断的评估显示了ASCENT++知识库的卓越质量，而对QA支持任务的外部评估则强调了ASCENT++的好处。可在以下位置访问web界面、数据和代码：https://www.mpi-inf.mpg.de/ascentpp. 摘要：Commonsense knowledge (CSK) about concepts and their properties is useful for AI applications. Prior works like ConceptNet, COMET and others compiled large CSK collections, but are restricted in their expressiveness to subject-predicate-object (SPO) triples with simple concepts for S and strings for P and O. This paper presents a method, called ASCENT++, to automatically build a large-scale knowledge base (KB) of CSK assertions, with refined expressiveness and both better precision and recall than prior works. ASCENT++ goes beyond SPO triples by capturing composite concepts with subgroups and aspects, and by refining assertions with semantic facets. The latter is important to express the temporal and spatial validity of assertions and further qualifiers. ASCENT++ combines open information extraction with judicious cleaning and ranking by typicality and saliency scores. For high coverage, our method taps into the large-scale crawl C4 with broad web contents. The evaluation with human judgements shows the superior quality of the ASCENT++ KB, and an extrinsic evaluation for QA-support tasks underlines the benefits of ASCENT++. A web interface, data and code can be accessed at https://www.mpi-inf.mpg.de/ascentpp.

推理|分析|理解|解释(1篇)

【1】 PTR: A Benchmark for Part-based Conceptual, Relational, and Physical Reasoning 标题：PTR：基于零件的概念推理、关系推理和物理推理的基准链接：https://arxiv.org/abs/2112.05136

作者：Yining Hong,Li Yi,Joshua B. Tenenbaum,Antonio Torralba,Chuang Gan 机构：UCLA, Stanford University, MIT BCS, CBMM, CSAIL, MIT CSAIL, MIT-IBM Watson AI Lab 备注：NeurIPS 2021. Project page: this http URL 摘要：将视觉对象进一步分解为一个整体和一个部分的能力是构成视觉层次的关键。这种复合结构可以归纳出丰富的语义概念和关系，从而在视觉信号的解释和组织以及视觉感知和推理的泛化中发挥重要作用。然而，现有的视觉推理基准主要关注对象而不是零件。由于更细粒度的概念、更丰富的几何关系和更复杂的物理，基于完整零件-整体层次结构的可视化推理比以对象为中心的推理更具挑战性。因此，为了更好地为基于零件的概念、关系和物理推理服务，我们引入了一个新的大规模诊断可视化推理数据集PTR。PTR包含约70k RGBD合成图像，其中包含关于语义实例分割、颜色属性、空间和几何关系以及某些物理属性（如稳定性）的地面真实对象和零件级注释。这些图像与700k机器生成的问题相结合，涵盖各种类型的推理类型，使它们成为视觉推理模型的良好测试平台。我们在此数据集上研究了几种最先进的视觉推理模型，并观察到它们在人类可以轻松推断正确答案的情况下仍然会犯许多令人惊讶的错误。我们相信，该数据集将为基于零件的推理提供新的机会。摘要：A critical aspect of human visual perception is the ability to parse visual scenes into individual objects and further into object parts, forming part-whole hierarchies. Such composite structures could induce a rich set of semantic concepts and relations, thus playing an important role in the interpretation and organization of visual signals as well as for the generalization of visual perception and reasoning. However, existing visual reasoning benchmarks mostly focus on objects rather than parts. Visual reasoning based on the full part-whole hierarchy is much more challenging than object-centric reasoning due to finer-grained concepts, richer geometry relations, and more complex physics. Therefore, to better serve for part-based conceptual, relational and physical reasoning, we introduce a new large-scale diagnostic visual reasoning dataset named PTR. PTR contains around 70k RGBD synthetic images with ground truth object and part level annotations regarding semantic instance segmentation, color attributes, spatial and geometric relationships, and certain physical properties such as stability. These images are paired with 700k machine-generated questions covering various types of reasoning types, making them a good testbed for visual reasoning models. We examine several state-of-the-art visual reasoning models on this dataset and observe that they still make many surprising mistakes in situations where humans can easily infer the correct answer. We believe this dataset will open up new opportunities for part-based reasoning.

GAN|对抗|攻击|生成相关(1篇)

【1】 Self-Supervised Image-to-Text and Text-to-Image Synthesis 标题：自监督图文和文图合成链接：https://arxiv.org/abs/2112.04928

作者：Anindya Sundar Das,Sriparna Saha 机构：Department of Computer Science and Engineering, Indian Institute of Technology Patna, India 备注：None 摘要：对视觉和语言及其相互关系的全面理解对于认识这些模式之间潜在的相似性和差异以及学习更普遍、更有意义的表达至关重要。近年来，大多数与文本到图像合成和图像到文本生成相关的工作都集中在有监督生成的深层架构来解决这些问题，其中很少有人关注跨模式的嵌入空间之间的相似性。本文提出了一种新的基于自监督深度学习的跨模态嵌入空间学习方法；用于生成图像到文本和文本到图像。在我们的方法中，我们首先使用基于StackGAN的自动编码器模型获得图像的稠密矢量表示，并使用基于LSTM的文本自动编码器在句子级获得稠密矢量表示；然后利用基于GAN和最大均值差的生成网络研究了从一种模态的嵌入空间到另一种模态的嵌入空间的映射。我们还证明了我们的模型学习从图像数据生成文本描述，以及从定性和定量的文本数据生成图像。摘要：A comprehensive understanding of vision and language and their interrelation are crucial to realize the underlying similarities and differences between these modalities and to learn more generalized, meaningful representations. In recent years, most of the works related to Text-to-Image synthesis and Image-to-Text generation, focused on supervised generative deep architectures to solve the problems, where very little interest was placed on learning the similarities between the embedding spaces across modalities. In this paper, we propose a novel self-supervised deep learning based approach towards learning the cross-modal embedding spaces; for both image to text and text to image generations. In our approach, we first obtain dense vector representations of images using StackGAN-based autoencoder model and also dense vector representations on sentence-level utilizing LSTM based text-autoencoder; then we study the mapping from embedding space of one modality to embedding space of the other modality utilizing GAN and maximum mean discrepancy based generative networks. We, also demonstrate that our model learns to generate textual description from image data as well as images from textual data both qualitatively and quantitatively.

检测相关(3篇)

【1】 Multimodal Fake News Detection 标题：多模态假新闻检测链接：https://arxiv.org/abs/2112.04831

作者：Santiago Alonso-Bartolome,Isabel Segura-Bedmar 机构：Computer Science Department, Universidad Carlos III de Madrid, Avenida de la, Universidad, Leganés, Madrid, Spain, Corresponding Author:, Tel: +, arXiv:,.,v, [cs.CL] , Dec 摘要：在过去几年中，虚假新闻空前泛滥。因此，我们更容易受到错误信息和虚假信息传播在我们社会不同阶层可能产生的有害影响。因此，开发自动检测假新闻的工具在防止其负面影响方面发挥着重要作用。大多数检测和分类虚假内容的尝试只关注于使用文本信息。多模态方法不太常见，它们通常将新闻分为真假。在这项工作中，我们使用单峰和多峰方法对Fakeddit数据集上的假新闻进行细粒度分类。我们的实验表明，基于卷积神经网络（CNN）结构的多模态方法结合了文本和图像数据，获得了最好的结果，准确率为87%。一些虚假新闻类别，如被操纵的内容、讽刺或虚假连接，从图像的使用中获益匪浅。使用图像还可以改善其他类别的结果，但影响较小。关于仅使用文本的单峰方法，来自Transformer的双向编码器表示（BERT）是最佳模型，精度为78%。因此，利用文本和图像数据可以显著提高假新闻检测的性能。摘要：Over the last years, there has been an unprecedented proliferation of fake news. As a consequence, we are more susceptible to the pernicious impact that misinformation and disinformation spreading can have in different segments of our society. Thus, the development of tools for automatic detection of fake news plays and important role in the prevention of its negative effects. Most attempts to detect and classify false content focus only on using textual information. Multimodal approaches are less frequent and they typically classify news either as true or fake. In this work, we perform a fine-grained classification of fake news on the Fakeddit dataset, using both unimodal and multimodal approaches. Our experiments show that the multimodal approach based on a Convolutional Neural Network (CNN) architecture combining text and image data achieves the best results, with an accuracy of 87%. Some fake news categories such as Manipulated content, Satire or False connection strongly benefit from the use of images. Using images also improves the results of the other categories, but with less impact. Regarding the unimodal approaches using only text, Bidirectional Encoder Representations from Transformers (BERT) is the best model with an accuracy of 78%. Therefore, exploiting both text and image data significantly improves the performance of fake news detection.

【2】 Combining Textual Features for the Detection of Hateful and Offensive Language 标题：结合文本特征检测仇恨和攻击性语言链接：https://arxiv.org/abs/2112.04803

作者：Sherzod Hakimov,Ralph Ewerth 机构：TIB - Leibniz Information Centre for Science and Technology, Hannover, Germany, Leibniz University Hannover, L,S Research Center, Hannover, Germany 备注：HASOC 2021, Forum for Information Retrieval Evaluation, 2021 摘要：自从网络攻击成为一种攻击性行为以来，许多网络用户在日常社交活动中都会受到攻击性语言的攻击。在这篇文章中，我们分析了如何结合不同的文本特征来检测Twitter上的仇恨或攻击性帖子。我们提供了详细的实验评估，以了解神经网络架构中每个构建块的影响。在英语子任务1A：从团队名称TIB-VA下的HASOC-2021数据集的post数据集中识别仇恨、攻击性和亵渎性内容上，对提议的架构进行了评估。我们比较了上下文词嵌入的不同变体，结合字符级嵌入和收集的仇恨词编码。摘要：The detection of offensive, hateful and profane language has become a critical challenge since many users in social networks are exposed to cyberbullying activities on a daily basis. In this paper, we present an analysis of combining different textual features for the detection of hateful or offensive posts on Twitter. We provide a detailed experimental evaluation to understand the impact of each building block in a neural network architecture. The proposed architecture is evaluated on the English Subtask 1A: Identifying Hate, offensive and profane content from the post datasets of HASOC-2021 dataset under the team name TIB-VA. We compared different variants of the contextual word embeddings combined with the character level embeddings and the encoding of collected hate terms.

【3】 Detecting Potentially Harmful and Protective Suicide-related Content on Twitter: A Machine Learning Approach 标题：检测Twitter上与自杀相关的潜在有害和保护性内容：一种机器学习方法链接：https://arxiv.org/abs/2112.04796

作者：Hannah Metzler,Hubert Baginski,Thomas Niederkrotenthaler,David Garcia 机构：Affiliations:, ., Complexity Science Hub Vienna, Austria, Section for Science of Complex Systems, Center for Medical Statistics, Informatics and, Intelligent Systems, Medical University of Vienna, Austria 摘要：研究表明，接触与自杀相关的新闻媒体内容与自杀率相关，其中一些内容特征可能具有有害影响，而另一些可能具有保护作用。虽然一些选定的特征存在良好的证据，但总体上缺乏系统的大规模调查，尤其是社交媒体数据。我们使用机器学习方法自动标记大量Twitter数据。我们开发了一种新的注释方案，将与自杀相关的推特分类为不同的消息类型和以问题与解决方案为中心的观点。然后，我们训练了一个机器学习模型的基准，包括一个多数分类器、一种基于词频的方法（TF-IDF和一个线性SVM）和两个最先进的深度学习模型（BERT，XLNet）。两种深度学习模式在两项分类任务中表现最佳：首先，我们将六个主要内容分类，包括关于自杀想法和企图或应对的个人故事，旨在传播问题意识或预防相关信息的行动呼吁，自杀案例报告，以及其他与自杀相关和离题的推文。深度学习模型在六个类别中的准确率平均达到73%以上，F1在除自杀意念和企图类别（55%）以外的所有类别中的得分在69%到85%之间。其次，在将涉及实际自杀的帖子与非主题推文分开的过程中，他们正确标记了约88%的推文，BERT在这两类推文中的F1得分分别为93%和74%。这些分类性能与类似任务的最新水平相当。通过提高数据标记的效率，这项工作使未来能够大规模调查各种社交媒体内容对自杀率和求助行为的有害和保护作用。摘要：Research shows that exposure to suicide-related news media content is associated with suicide rates, with some content characteristics likely having harmful and others potentially protective effects. Although good evidence exists for a few selected characteristics, systematic large scale investigations are missing in general, and in particular for social media data. We apply machine learning methods to automatically label large quantities of Twitter data. We developed a novel annotation scheme that classifies suicide-related tweets into different message types and problem- vs. solution-focused perspectives. We then trained a benchmark of machine learning models including a majority classifier, an approach based on word frequency (TF-IDF with a linear SVM) and two state-of-the-art deep learning models (BERT, XLNet). The two deep learning models achieved the best performance in two classification tasks: First, we classified six main content categories, including personal stories about either suicidal ideation and attempts or coping, calls for action intending to spread either problem awareness or prevention-related information, reportings of suicide cases, and other suicide-related and off-topic tweets. The deep learning models reach accuracy scores above 73% on average across the six categories, and F1-scores in between 69% and 85% for all but the suicidal ideation and attempts category (55%). Second, in separating postings referring to actual suicide from off-topic tweets, they correctly labelled around 88% of tweets, with BERT achieving F1-scores of 93% and 74% for the two categories. These classification performances are comparable to the state-of-the-art on similar tasks. By making data labeling more efficient, this work enables future large-scale investigations on harmful and protective effects of various kinds of social media content on suicide rates and on help-seeking behavior.

Zero/Few/One-Shot|迁移|自适应(1篇)

【1】 Few-Shot NLU with Vector Projection Distance and Abstract Triangular CRF 标题：具有矢量投影距离和抽象三角形CRF的Few-ShotNLU 链接：https://arxiv.org/abs/2112.04999

作者：Su Zhu,Lu Chen,Ruisheng Cao,Zhi Chen,Qingliang Miao,Kai Yu 机构： AISpeech Co., Ltd., Suzhou, China, X-LANCE Lab, Department of Computer Science and Engineering, Shanghai Jiao Tong University, Shanghai, China 备注：Accepted by NLPCC 2021 摘要：数据稀疏性问题是自然语言理解（NLU）的一个关键挑战，特别是对于一个新的目标领域。通过在源域中训练NLU模型并将该模型直接应用于任意目标域（即使没有微调），很少的镜头NLU对于缓解数据稀缺问题至关重要。在这篇文章中，我们提出用向量投影距离和抽象三角形条件随机场（CRF）来改进Few-ShotNLU的原型网络。向量投影距离利用上下文单词嵌入在标签向量上的投影作为单词标签相似度，这相当于一个标准化的线性模型。抽象三角形CRF学习用于联合意图分类和时隙填充任务的领域不可知标签转换。大量实验表明，我们提出的方法可以显著地超过强基线。具体而言，我们的方法可以在中文和英文的两个少量NLU基准（少量联合和SNIP）上实现最新水平，而无需对目标域进行微调。摘要：Data sparsity problem is a key challenge of Natural Language Understanding (NLU), especially for a new target domain. By training an NLU model in source domains and applying the model to an arbitrary target domain directly (even without fine-tuning), few-shot NLU becomes crucial to mitigate the data scarcity issue. In this paper, we propose to improve prototypical networks with vector projection distance and abstract triangular Conditional Random Field (CRF) for the few-shot NLU. The vector projection distance exploits projections of contextual word embeddings on label vectors as word-label similarities, which is equivalent to a normalized linear model. The abstract triangular CRF learns domain-agnostic label transitions for joint intent classification and slot filling tasks. Extensive experiments demonstrate that our proposed methods can significantly surpass strong baselines. Specifically, our approach can achieve a new state-of-the-art on two few-shot NLU benchmarks (Few-Joint and SNIPS) in Chinese and English without fine-tuning on target domains.

其他(4篇)

【1】 How Universal is Genre in Universal Dependencies? 标题：在“普遍依赖”中，流派有多普遍？链接：https://arxiv.org/abs/2112.04971

作者：Max Müller-Eberstein,Rob van der Goot,Barbara Plank 机构：Department of Computer Science, IT University of Copenhagen, Denmark 备注：Accepted at SyntaxFest 2021 摘要：这项工作首次对通用依赖（UD）中的体裁进行了深入分析。与以往的体裁识别工作不同，UD在单语/双语设置中使用了一小部分定义良好的标签，UD包含18种体裁，它们在114种语言中具有不同程度的特异性。由于大多数树库都标记有多种类型，但缺少关于哪些实例属于哪种类型的注释，因此我们提出了四种方法，利用树库元数据的弱监督来预测实例级类型。所提出的方法比竞争基线更好地恢复实例级别的类型，这是在具有标记实例的UD子集上测量的，并且更好地遵守全局预期分布。我们的分析揭示了以前使用UD类型元数据进行树库选择的工作，发现元数据本身就是一个噪声信号，必须在树库中分离出来才能普遍应用。摘要：This work provides the first in-depth analysis of genre in Universal Dependencies (UD). In contrast to prior work on genre identification which uses small sets of well-defined labels in mono-/bilingual setups, UD contains 18 genres with varying degrees of specificity spread across 114 languages. As most treebanks are labeled with multiple genres while lacking annotations about which instances belong to which genre, we propose four methods for predicting instance-level genre using weak supervision from treebank metadata. The proposed methods recover instance-level genre better than competitive baselines as measured on a subset of UD with labeled instances and adhere better to the global expected distribution. Our analysis sheds light on prior work using UD genre metadata for treebank selection, finding that metadata alone are a noisy signal and must be disentangled within treebanks before it can be universally applied.

【2】 A Simple but Effective Bidirectional Extraction Framework for Relational Triple Extraction 标题：一种简单有效的关系型三元抽取双向抽取框架链接：https://arxiv.org/abs/2112.04940

作者：Feiliang Ren,Longhui Zhang,Xiaofeng Zhao,Shujuan Yin,Shilei Liu,Bochao Li 机构：Northeastern University, Shenyang, China 备注：WSDM2022 摘要：基于标记的关系三重抽取方法近年来受到越来越多的研究关注。然而，这些方法大多采用单向提取框架，首先提取所有主题，然后根据提取的主题同时提取对象和关系。该框架存在一个明显的缺陷，即对被试的提取结果过于敏感。为了克服这一缺陷，我们提出了一种基于双向提取框架的方法，该方法基于从两个互补方向提取的实体对来提取三元组。具体地说，我们首先从两个平行方向提取所有可能的主客体对。这两个提取方向由共享编码器组件连接，因此从一个方向提取的特征可以流向另一个方向，反之亦然。通过这种方式，两个方向的提取可以相互促进和补充。接下来，我们通过双仿射模型为每个实体对分配所有可能的关系。在训练期间，我们观察到股权结构会导致收敛速度不一致问题，这对绩效有害。因此，我们提出了一种共享感知学习机制来解决这一问题。我们在多个基准数据集上评估了所提出的模型。大量的实验结果表明，该模型是非常有效的，在所有这些数据集上都取得了最新的结果。此外，实验表明，所提出的双向抽取框架和共享感知学习机制都具有良好的适应性，可以用来提高其他基于标签的方法的性能。我们工作的源代码可从以下网址获得：https://github.com/neukg/BiRTE. 摘要：Tagging based relational triple extraction methods are attracting growing research attention recently. However, most of these methods take a unidirectional extraction framework that first extracts all subjects and then extracts objects and relations simultaneously based on the subjects extracted. This framework has an obvious deficiency that it is too sensitive to the extraction results of subjects. To overcome this deficiency, we propose a bidirectional extraction framework based method that extracts triples based on the entity pairs extracted from two complementary directions. Concretely, we first extract all possible subject-object pairs from two paralleled directions. These two extraction directions are connected by a shared encoder component, thus the extraction features from one direction can flow to another direction and vice versa. By this way, the extractions of two directions can boost and complement each other. Next, we assign all possible relations for each entity pair by a biaffine model. During training, we observe that the share structure will lead to a convergence rate inconsistency issue which is harmful to performance. So we propose a share-aware learning mechanism to address it. We evaluate the proposed model on multiple benchmark datasets. Extensive experimental results show that the proposed model is very effective and it achieves state-of-the-art results on all of these datasets. Moreover, experiments show that both the proposed bidirectional extraction framework and the share-aware learning mechanism have good adaptability and can be used to improve the performance of other tagging based methods. The source code of our work is available at: https://github.com/neukg/BiRTE.

【3】 Nice perfume. How long did you marinate in it? Multimodal Sarcasm Explanation 标题：不错的香水。你在里面泡了多久？多模态反讽解释链接：https://arxiv.org/abs/2112.04873

作者：Poorav Desai,Tanmoy Chakraborty,Md Shad Akhtar 机构：Indraprastha Institute of Information Technology, Delhi (IIIT Delhi), India 备注：Accepted for publication in AAAI-2022 摘要：讽刺是一种普遍存在的语言现象，由于其主观性、缺乏语境和深刻的见解，解释起来极具挑战性。在多模态设置中，讽刺是通过文本和视觉实体之间的不协调来传达的。尽管最近的方法将讽刺作为一个分类问题来处理，但不清楚为什么一篇在线文章被认定为讽刺。如果没有适当的解释，最终用户可能无法感受到潜在的讽刺意味。在本文中，我们提出了一个新的问题——多模态讽刺解释（MuSE）——给定一个包含图像和标题的多模态讽刺帖子，我们的目标是生成一个自然语言解释来揭示预期的讽刺。为此，我们开发了MORE，一个新的数据集，解释了3510篇多模态讽刺文章。每个解释都是一个自然语言（英语）句子，描述隐藏的反讽。我们通过采用基于多模式转换器的体系结构来进行更多的基准测试。它在Transformer的编码器中包含了一个跨模式注意，它注意两种模式之间的区别特征。随后，使用基于BART的自回归解码器作为生成器。实证结果表明，在五个评估指标的不同基线（MuSE采用）上得出了令人信服的结果。我们还对预测进行人类评估，并获得Fleiss的Kappa分数0.4，这是25名评估人员之间的公平一致。摘要：Sarcasm is a pervading linguistic phenomenon and highly challenging to explain due to its subjectivity, lack of context and deeply-felt opinion. In the multimodal setup, sarcasm is conveyed through the incongruity between the text and visual entities. Although recent approaches deal with sarcasm as a classification problem, it is unclear why an online post is identified as sarcastic. Without proper explanation, end users may not be able to perceive the underlying sense of irony. In this paper, we propose a novel problem -- Multimodal Sarcasm Explanation (MuSE) -- given a multimodal sarcastic post containing an image and a caption, we aim to generate a natural language explanation to reveal the intended sarcasm. To this end, we develop MORE, a new dataset with explanation of 3510 sarcastic multimodal posts. Each explanation is a natural language (English) sentence describing the hidden irony. We benchmark MORE by employing a multimodal Transformer-based architecture. It incorporates a cross-modal attention in the Transformer's encoder which attends to the distinguishing features between the two modalities. Subsequently, a BART-based auto-regressive decoder is used as the generator. Empirical results demonstrate convincing results over various baselines (adopted for MuSE) across five evaluation metrics. We also conduct human evaluation on predictions and obtain Fleiss' Kappa score of 0.4 as a fair agreement among 25 evaluators.

【4】 Towards Neural Functional Program Evaluation 标题：走向神经功能程序评价链接：https://arxiv.org/abs/2112.04630

作者：Torsten Scholak,Jonathan Pilault,Joey Velez-Ginorio 机构：ElementAI, a ServiceNow company,Polytechnique Montreal & Mila,University of Pennsylvania 备注：9 pages. Accepted at the AIPLANS workshop at NeurIPS 2021 摘要：本文探讨了基于电流互感器的语言模型对简单函数式编程语言进行程序评估的能力。我们引入了一种新的程序生成机制，允许控制语义等价程序的语法糖。T5实验表明，神经功能程序评估表现出人意料地好，在大多数分布内和分布外测试中获得了高达90%的精确程序匹配分数。与随机初始化相比，使用预训练的T5权重具有显著的优势。我们提出并评估了三个数据集，以研究特定于函数程序的泛化能力：类型、函数组合和缩减步骤。代码和数据可在https://github.com/ElementAI/neural-interpreters. 摘要：This paper explores the capabilities of current transformer-based language models for program evaluation of simple functional programming languages. We introduce a new program generation mechanism that allows control over syntactic sugar for semantically equivalent programs. T5 experiments reveal that neural functional program evaluation performs surprisingly well, achieving high 90% exact program match scores for most in-distribution and out-of-distribution tests. Using pretrained T5 weights has significant advantages over random initialization. We present and evaluate on three datasets to study generalization abilities that are specific to functional programs based on: type, function composition, and reduction steps. Code and data are publicly available at https://github.com/ElementAI/neural-interpreters.

机器翻译，仅供参考

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-12-10，如有侵权请联系 cloudcommunity@tencent.com 删除

linux