自然语言处理学术速递[7.28]

公众号-arXiv每日学术速递

发布于 2021-07-29 14:22:17

6350

发布于 2021-07-29 14:22:17

文章被收录于专栏：arXiv每日学术速递arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递，涵盖CS|物理|数学|经济|统计|金融|生物|电气领域，更有搜索、收藏、发帖等功能！点击阅读原文即可访问

cs.CL 方向，今日共计13篇

QA|VQA|问答|对话(2篇)

【1】 QA Dataset Explosion: A Taxonomy of NLP Resources for Question Answering and Reading Comprehension 标题：问答数据集爆炸：面向问答和阅读理解的自然语言处理资源分类

作者：Anna Rogers,Matt Gardner,Isabelle Augenstein 机构： University of Copenhagen (Denmark), Allen Institute for Artificial Intelligence 备注：Under review 链接：https://arxiv.org/abs/2107.12708 摘要：近几年来，随着对NLP中深度学习模型的大量研究，对跟踪建模过程所需的基准数据集也做了大量的工作。在这方面，问答和阅读理解尤其丰富，在过去两年中出现了80多个新的数据集。这项研究是迄今为止最大的实地调查。我们提供了当前资源的各种格式和域的概述，突出了当前的缺陷，以供将来的工作参考。我们进一步讨论了目前问答中“推理类型”的分类，并提出了一种新的分类法。我们还讨论了过度关注英语的含义，并调查了当前其他语言的单语资源和多语资源。这项研究的目标既包括寻找丰富现有数据指针的从业者，也包括研究新资源的研究人员。摘要：Alongside huge volumes of research on deep learning models in NLP in the recent years, there has been also much work on benchmark datasets needed to track modeling progress. Question answering and reading comprehension have been particularly prolific in this regard, with over 80 new datasets appearing in the past two years. This study is the largest survey of the field to date. We provide an overview of the various formats and domains of the current resources, highlighting the current lacunae for future work. We further discuss the current classifications of ``reasoning types" in question answering and propose a new taxonomy. We also discuss the implications of over-focusing on English, and survey the current monolingual resources for other languages and multilingual resources. The study is aimed at both practitioners looking for pointers to the wealth of existing data, and at researchers working on new resources.

【2】 Greedy Gradient Ensemble for Robust Visual Question Answering 标题：贪婪梯度集成在鲁棒视觉问答中的应用

作者：Xinzhe Han,Shuhui Wang,Chi Su,Qingming Huang,Qi Tian 机构：Key Lab of Intell. Info. Process., Inst. of Comput. Tech., CAS, Beijing, China, University of Chinese Academy of Sciences, Beijing, China, Kingsoft Cloud, Beijing, China, Peng Cheng Laboratory, Shenzhen, China, Cloud BU, Huawei Technologies, Shenzhen, China. 备注：Accepted by ICCV 2021. Code: this https URL 链接：https://arxiv.org/abs/2107.12651 摘要：语言偏差是视觉问答（VQA）中的一个关键问题，在VQA中，模型往往利用数据集偏差进行最终决策，而不考虑图像信息。因此，他们遭受了性能下降的分布数据和不够直观的解释。在对现有的鲁棒VQA方法进行实验分析的基础上，着重分析了VQA中的语言偏差，即分布偏差和快捷偏差。我们进一步提出了一个新的去偏框架，即贪婪梯度集成（GGE），它结合了多个有偏模型进行无偏基模型学习。通过贪婪策略，GGE强制有偏模型优先拟合有偏数据分布，从而使基础模型更加关注有偏模型难以求解的实例。实验结果表明，该方法在不使用额外标注的情况下，较好地利用了视觉信息，在诊断数据集VQA-CP时达到了最先进的性能。摘要：Language bias is a critical issue in Visual Question Answering (VQA), where models often exploit dataset biases for the final decision without considering the image information. As a result, they suffer from performance drop on out-of-distribution data and inadequate visual explanation. Based on experimental analysis for existing robust VQA methods, we stress the language bias in VQA that comes from two aspects, i.e., distribution bias and shortcut bias. We further propose a new de-bias framework, Greedy Gradient Ensemble (GGE), which combines multiple biased models for unbiased base model learning. With the greedy strategy, GGE forces the biased models to over-fit the biased data distribution in priority, thus makes the base model pay more attention to examples that are hard to solve by biased models. The experiments demonstrate that our method makes better use of visual information and achieves state-of-the-art performance on diagnosing dataset VQA-CP without using extra annotations.

半/弱/无监督|不确定性(1篇)

【1】 Unsupervised Domain Adaptation for Hate Speech Detection Using a Data Augmentation Approach 标题：基于数据增强的无监督领域自适应仇恨语音检测

作者：Sheikh Muhammad Sarwar,Vanessa Murdock 机构： CIIR, College of Information and Computer Sciences, University of Massachusetts Amherst 链接：https://arxiv.org/abs/2107.12866 摘要：以仇恨言论为形式的网络骚扰近年来呈上升趋势。解决这一问题需要人们结合内容调节，并辅以自动检测方法。由于内容适度本身对人们有害，我们希望通过改进仇恨言论的自动检测来减轻负担。仇恨言论是一个挑战，因为它是针对不同的目标群体使用完全不同的词汇。此外，仇恨言论的作者也受到了激励，他们要掩饰自己的行为，以避免被开除出政坛。这使得开发用于训练和评估仇恨语音检测模型的综合数据集变得困难，因为代表一个仇恨语音域的示例通常不代表其他域，即使在相同的语言或文化中也是如此。我们提出了一种无监督的域自适应方法来增加标记数据，用于仇恨语音检测。我们用三种不同的模型（字符CNNs、BiLSTMs和BERT）在三个不同的集合上评估了该方法。我们表明，我们的方法提高了精度/召回曲线下的区域高达42%，召回率高达278%，在精度上没有损失（在某些情况下有显著的提高）。摘要：Online harassment in the form of hate speech has been on the rise in recent years. Addressing the issue requires a combination of content moderation by people, aided by automatic detection methods. As content moderation is itself harmful to the people doing it, we desire to reduce the burden by improving the automatic detection of hate speech. Hate speech presents a challenge as it is directed at different target groups using a completely different vocabulary. Further the authors of the hate speech are incentivized to disguise their behavior to avoid being removed from a platform. This makes it difficult to develop a comprehensive data set for training and evaluating hate speech detection models because the examples that represent one hate speech domain do not typically represent others, even within the same language or culture. We propose an unsupervised domain adaptation approach to augment labeled data for hate speech detection. We evaluate the approach with three different models (character CNNs, BiLSTMs and BERT) on three different collections. We show our approach improves Area under the Precision/Recall curve by as much as 42% and recall by as much as 278%, with no loss (and in some cases a significant gain) in precision.

检测相关(2篇)

【1】 Emotion Stimulus Detection in German News Headlines 标题：德国新闻标题中的情感刺激检测

作者：{Bao Minh} {Doan Dang},Laura Oberländer,Roman Klinger 备注：accepted at KONVENS 2021 链接：https://arxiv.org/abs/2107.12920 摘要：情绪刺激提取是情绪分析的一个细粒度子任务，它侧重于从文本段落中识别情绪表达背后的原因描述（例如，在句子“我很高兴我通过了考试”中，短语“通过了考试”对应于刺激）。以前的工作主要集中在普通话和英语，没有德语的资源和模式。我们开发了一个2006年德国新闻标题的情感注释语料库和811个刺激短语注释语料库，填补了这一研究空白。考虑到这样的语料库创建工作既耗时又昂贵，我们还研究了一种将现有的英语GoodNewsEveryone（GNE）语料库投影到机器翻译的德语版本的方法。我们比较了条件随机场（CRF）模型（德语单语训练和投影跨语言训练）和多语XLM-RoBERTa（XLM-R）模型的性能。结果表明，使用德语语料库进行训练比使用投影法获得更高的F1成绩。与XLM-R的实验优于各自的CRF同行。摘要：Emotion stimulus extraction is a fine-grained subtask of emotion analysis that focuses on identifying the description of the cause behind an emotion expression from a text passage (e.g., in the sentence "I am happy that I passed my exam" the phrase "passed my exam" corresponds to the stimulus.). Previous work mainly focused on Mandarin and English, with no resources or models for German. We fill this research gap by developing a corpus of 2006 German news headlines annotated with emotions and 811 instances with annotations of stimulus phrases. Given that such corpus creation efforts are time-consuming and expensive, we additionally work on an approach for projecting the existing English GoodNewsEveryone (GNE) corpus to a machine-translated German version. We compare the performance of a conditional random field (CRF) model (trained monolingually on German and cross-lingually via projection) with a multilingual XLM-RoBERTa (XLM-R) model. Our results show that training with the German corpus achieves higher F1 scores than projection. Experiments with XLM-R outperform their respective CRF counterparts.

【2】 Energy-based Unknown Intent Detection with Data Manipulation 标题：基于能量的数据处理未知意图检测

作者：Yawen Ouyang,Jiasheng Ye,Yu Chen,Xinyu Dai,Shujian Huang,Jiajun Chen 机构：∗, National Key Laboratory for Novel Software Technology, Nanjing University, China 备注：10 pages, 4 figures, accepted by Findings of ACL-IJCNLP 2021 链接：https://arxiv.org/abs/2107.12542 摘要：未知意图检测的目的是识别出在训练集中从未出现过意图的分布外（outofdistribution，OOD）话语。在这篇论文中，我们建议使用能量分数来完成这个任务，因为能量分数理论上与输入的密度一致，并且可以从任何分类器中得到。然而，在训练阶段需要高质量的OOD话语，以形成OOD与in-distribution（IND）之间的能量差，而这些话语在实践中很难被收集。为了解决这个问题，我们提出了一个数据操作框架来生成高质量的具有重要性权重的OOD语句。实验结果表明，利用GOT对基于能量的探测器进行微调，可以在两个基准数据集上获得最新的结果。摘要：Unknown intent detection aims to identify the out-of-distribution (OOD) utterance whose intent has never appeared in the training set. In this paper, we propose using energy scores for this task as the energy score is theoretically aligned with the density of the input and can be derived from any classifier. However, high-quality OOD utterances are required during the training stage in order to shape the energy gap between OOD and in-distribution (IND), and these utterances are difficult to collect in practice. To tackle this problem, we propose a data manipulation framework to Generate high-quality OOD utterances with importance weighTs (GOT). Experimental results show that the energy-based detector fine-tuned by GOT can achieve state-of-the-art results on two benchmark datasets.

识别/分类(2篇)

【1】 Emotion Recognition under Consideration of the Emotion Component Process Model 标题：考虑情感成分过程模型的情感识别

作者：Felix Casel,Amelie Heindl,Roman Klinger 机构：Institut f¨ur Maschinelle Sprachverarbeitung, University of Stuttgart, Pfaffenwaldring ,b, Stuttgart, Germany 备注：accepted at KONVENS 2021 链接：https://arxiv.org/abs/2107.12895 摘要：文本中的情感分类通常是通过神经网络模型来实现的，神经网络模型学习将语言单位与情感联系起来。虽然这通常会导致良好的预测性能，但它只在一定程度上有助于理解情绪如何在各个领域进行沟通。Scherer（2005）提出的情感成分过程模型（CPM）是解释情感交流的一种有趣的方法。它指出，情绪是各种子成分对事件的反应的协调过程，即主观感受、认知评价、表达、生理身体反应和动机行为倾向。我们假设这些成分与语言实现有关：一种情绪可以通过描述生理身体反应（“他在颤抖”）或表情（“她微笑”）来表达，我们用情感成分类对现有文献和Twitter情感语料库进行了注释，发现Twitter上的情感主要通过事件描述或对情感的主观报道来表达，而在文学作品中，作者更倾向于描述人物的行为，并将解释权留给读者。我们进一步将CPM纳入到多任务学习模型中，发现这支持情绪分类。注释语料库可在https://www.ims.uni-stuttgart.de/data/emotion. 摘要：Emotion classification in text is typically performed with neural network models which learn to associate linguistic units with emotions. While this often leads to good predictive performance, it does only help to a limited degree to understand how emotions are communicated in various domains. The emotion component process model (CPM) by Scherer (2005) is an interesting approach to explain emotion communication. It states that emotions are a coordinated process of various subcomponents, in reaction to an event, namely the subjective feeling, the cognitive appraisal, the expression, a physiological bodily reaction, and a motivational action tendency. We hypothesize that these components are associated with linguistic realizations: an emotion can be expressed by describing a physiological bodily reaction ("he was trembling"), or the expression ("she smiled"), etc. We annotate existing literature and Twitter emotion corpora with emotion component classes and find that emotions on Twitter are predominantly expressed by event descriptions or subjective reports of the feeling, while in literature, authors prefer to describe what characters do, and leave the interpretation to the reader. We further include the CPM in a multitask learning model and find that this supports the emotion categorization. The annotated corpora are available at https://www.ims.uni-stuttgart.de/data/emotion.

【2】 Improving Word Recognition in Speech Transcriptions by Decision-level Fusion of Stemming and Two-way Phoneme Pruning 标题：基于词干和双向音素修剪的决策级融合提高语音转写中的单词识别能力

作者：Sunakshi Mehra,Seba Susan 机构：Department of Information Technology, Shahbad Daulatpur, Main Bawana Road, Delhi , India 备注：Accepted in International Advanced Computing Conference (2020) 链接：https://arxiv.org/abs/2107.12428 摘要：提出了一种基于词干提取和双向音素剪枝的决策级融合的无监督语音修正方法。通过使用Ffmpeg框架提取音频，并使用googleapi进一步将音频转换为文本脚本，从视频中获取脚本。在基准LRW数据集中，有500个单词类别，每个类有50个mp4格式的视频。所有视频由29帧（每1.16秒长）组成，并且单词出现在视频的中间。在我们的方法中，我们尝试通过词干、音素提取、过滤和剪枝来提高基线的准确率，从9.34%提高。将词干提取算法应用于文本文本，并对结果进行了评价，得到了23.34%的准确率。为了将单词转换成音素，我们使用卡内基梅隆大学（CMU）发音词典，它提供了英语单词到他们发音的语音映射。提出了一种双向音素剪枝方法，包括两个非连续的步骤：1）对含有元音和爆破音的音素进行过滤剪枝；2）对含有元音和擦音的音素进行过滤剪枝。在得到词干提取和双向音素剪枝的结果后，我们采用了决策级融合，使得单词识别率提高了32.96%。摘要：We introduce an unsupervised approach for correcting highly imperfect speech transcriptions based on a decision-level fusion of stemming and two-way phoneme pruning. Transcripts are acquired from videos by extracting audio using Ffmpeg framework and further converting audio to text transcript using Google API. In the benchmark LRW dataset, there are 500 word categories, and 50 videos per class in mp4 format. All videos consist of 29 frames (each 1.16 s long) and the word appears in the middle of the video. In our approach we tried to improve the baseline accuracy from 9.34% by using stemming, phoneme extraction, filtering and pruning. After applying the stemming algorithm to the text transcript and evaluating the results, we achieved 23.34% accuracy in word recognition. To convert words to phonemes we used the Carnegie Mellon University (CMU) pronouncing dictionary that provides a phonetic mapping of English words to their pronunciations. A two-way phoneme pruning is proposed that comprises of the two non-sequential steps: 1) filtering and pruning the phonemes containing vowels and plosives 2) filtering and pruning the phonemes containing vowels and fricatives. After obtaining results of stemming and two-way phoneme pruning, we applied decision-level fusion and that led to an improvement of word recognition rate upto 32.96%.

其他神经网络|深度学习|模型|建模(3篇)

【1】 gaBERT -- an Irish Language Model 标题：Gabert--一种爱尔兰语言模式

作者：James Barry,Joachim Wagner,Lauren Cassidy,Alan Cowap,Teresa Lynn,Abigail Walsh,Mícheál J. Ó Meachair,Jennifer Foster 链接：https://arxiv.org/abs/2107.12930 摘要：BERT族神经语言模型由于能够提供具有丰富上下文敏感标记编码的文本序列而变得非常流行，这些标记编码能够很好地推广到许多自然语言处理任务中。已经发布了120多个涵盖50多种语言的单语言BERT模型，以及104种语言的多语言模型。我们介绍，加BERT，爱尔兰语言的单语BERT模型。我们将我们的gaBERT模型与多语言BERT模型进行了比较，结果表明gaBERT模型为下游解析任务提供了更好的表示。我们还展示了不同的过滤标准、词汇大小和子词标记化模型的选择如何影响下游性能。我们向社区发布gaBERT和相关代码。摘要：The BERT family of neural language models have become highly popular due to their ability to provide sequences of text with rich context-sensitive token encodings which are able to generalise well to many Natural Language Processing tasks. Over 120 monolingual BERT models covering over 50 languages have been released, as well as a multilingual model trained on 104 languages. We introduce, gaBERT, a monolingual BERT model for the Irish language. We compare our gaBERT model to multilingual BERT and show that gaBERT provides better representations for a downstream parsing task. We also show how different filtering criteria, vocabulary size and the choice of subword tokenisation model affect downstream performance. We release gaBERT and related code to the community.

【2】 Cross-lingual Transferring of Pre-trained Contextualized Language Models 标题：预训练语境化语言模型的跨语言迁移

作者：Zuchao Li,Kevin Parnow,Hai Zhao,Zhuosheng Zhang,Rui Wang,Masao Utiyama,Eiichiro Sumita 机构： and also with Key Laboratory of Shanghai EducationCommission for Intelligent Interaction and Cognitive Engineering, AI Institute, Shanghai Jiao Tong University 链接：https://arxiv.org/abs/2107.12627 摘要：虽然预先训练的语境化语言模型（PrLM）对自然语言处理产生了重大影响，但用英语以外的语言训练PrLM是不切实际的，原因有二：其他语言往往缺乏足够的语料库来训练强大的PrLM，而且由于人类语言的共性，不同语言的PrLM训练在计算上比较昂贵，这是有点多余的。在这篇文章中，基于最近的连接跨语言模型传递和神经机器翻译的作品，我们提出了一种新的用于PRLMS的跨语言模型传递框架：TreLM。为了处理语言之间的符号顺序和序列长度差异，我们提出了一种中间的“三层”结构，从这些差异中学习并在我们的主要翻译方向上创造更好的转换，以及一个新的跨语言迁移训练建模目标。此外，我们还展示了一种嵌入对齐方法，该方法利用PrLM的非语境化嵌入空间和三层结构来学习跨语言的文本转换网络，解决了语言间的词汇差异。在语言理解和结构分析任务上的实验表明，该框架在性能和效率上都明显优于从零开始训练的语言模型。此外，尽管在资源丰富的场景中，与从头开始的预训练相比，我们的跨语言模型转换框架的性能损失很小，但它明显更经济。摘要：Though the pre-trained contextualized language model (PrLM) has made a significant impact on NLP, training PrLMs in languages other than English can be impractical for two reasons: other languages often lack corpora sufficient for training powerful PrLMs, and because of the commonalities among human languages, computationally expensive PrLM training for different languages is somewhat redundant. In this work, building upon the recent works connecting cross-lingual model transferring and neural machine translation, we thus propose a novel cross-lingual model transferring framework for PrLMs: TreLM. To handle the symbol order and sequence length differences between languages, we propose an intermediate ``TRILayer" structure that learns from these differences and creates a better transfer in our primary translation direction, as well as a new cross-lingual language modeling objective for transfer training. Additionally, we showcase an embedding aligning that adversarially adapts a PrLM's non-contextualized embedding space and the TRILayer structure to learn a text transformation network across languages, which addresses the vocabulary difference between languages. Experiments on both language understanding and structure parsing tasks show the proposed framework significantly outperforms language models trained from scratch with limited data in both performance and efficiency. Moreover, despite an insignificant performance loss compared to pre-training from scratch in resource-rich scenarios, our cross-lingual model transferring framework is significantly more economical.

【3】 Federated Learning Meets Natural Language Processing: A Survey 标题：联邦学习遇到自然语言处理：综述

作者：Ming Liu,Stella Ho,Mengqi Wang,Longxiang Gao,Yuan Jin,He Zhang 机构：Deakin University 备注：19 pages 链接：https://arxiv.org/abs/2107.12603 摘要：联邦学习的目的是在不牺牲本地数据隐私的情况下，从多个分散的边缘设备（如手机）或服务器学习机器学习模型。最近的自然语言处理技术依赖于深度学习和大量预先训练的语言模型。然而，无论是大型的深层神经模型还是语言模型，都是用大量的数据来训练的，而这些数据通常都在服务器端。由于文本数据广泛来源于终端用户，在这项工作中，我们研究了最近使用联合学习作为学习框架的NLP模型和技术。我们的调查讨论了联邦自然语言处理中的主要挑战，包括算法挑战、系统挑战以及隐私问题。我们还对现有的联邦NLP评估方法和工具进行了评论。最后，我们指出了目前的研究差距和未来的发展方向。摘要：Federated Learning aims to learn machine learning models from multiple decentralized edge devices (e.g. mobiles) or servers without sacrificing local data privacy. Recent Natural Language Processing techniques rely on deep learning and large pre-trained language models. However, both big deep neural and language models are trained with huge amounts of data which often lies on the server side. Since text data is widely originated from end users, in this work, we look into recent NLP models and techniques which use federated learning as the learning framework. Our survey discusses major challenges in federated natural language processing, including the algorithm challenges, system challenges as well as the privacy issues. We also provide a critical review of the existing Federated NLP evaluation methods and tools. Finally, we highlight the current research gaps and future directions.

其他(3篇)

【1】 Measuring daily-life fear perception change: a computational study in the context of COVID-19 标题：测量日常生活中恐惧知觉的变化：冠状病毒背景下的计算研究

作者：Yuchen Chai,Juan Palacios,Jianghao Wang,Yichun Fan,Siqi Zheng 机构：Massachusetts Institute of Technology, MA, USA, Chinese Academy of Science, Beijing, China 备注：15 pages 链接：https://arxiv.org/abs/2107.12606 摘要：COVID-19作为一种全球性的健康危机，引发了空前强烈的恐惧情绪。除了害怕被感染之外，COVID-19的爆发也对人们的日常生活造成了严重的干扰，从而引发了对COVID-19感染的强烈的心理反应。在这里，我们利用2019年1月1日至2020年8月31日期间53.6万名中国用户发布的1600万条社交媒体帖子构建了一个表达恐惧的数据库。我们采用深度学习技术来检测每个帖子中的恐惧情绪，并应用主题模型来提取中心恐惧主题。根据该数据库，我们发现在流感大流行前（2019年1月至2019年12月），睡眠障碍（“噩梦”和“失眠”）占据了恐惧标签帖子的最大份额，我们发现健康和工作相关的担忧是COVID-19引发恐惧的两个主要来源。我们还发现性别差异，女性在COVID-19期间产生更多包含日常生活恐惧来源的帖子。本研究采用数据驱动的方法来追踪公众情绪，可以用来补充传统的调查，实现对情绪的实时监测，以识别社会关注点，支持决策。摘要：COVID-19, as a global health crisis, has triggered the fear emotion with unprecedented intensity. Besides the fear of getting infected, the outbreak of COVID-19 also created significant disruptions in people's daily life and thus evoked intensive psychological responses indirect to COVID-19 infections. Here, we construct an expressed fear database using 16 million social media posts generated by 536 thousand users between January 1st, 2019 and August 31st, 2020 in China. We employ deep learning techniques to detect the fear emotion within each post and apply topic models to extract the central fear topics. Based on this database, we find that sleep disorders ("nightmare" and "insomnia") take up the largest share of fear-labeled posts in the pre-pandemic period (January 2019-December 2019), and significantly increase during the COVID-19. We identify health and work-related concerns are the two major sources of fear induced by the COVID-19. We also detect gender differences, with females generating more posts containing the daily-life fear sources during the COVID-19 period. This research adopts a data-driven approach to trace back public emotion, which can be used to complement traditional surveys to achieve real-time emotion monitoring to discern societal concerns and support policy decision-making.

【2】 Dual Slot Selector via Local Reliability Verification for Dialogue State Tracking 标题：基于局部可靠性验证的对话状态跟踪双槽选择器

作者：Jinyu Guo,Kai Shuang,Jijie Li,Zihan Wang 机构：State Key Laboratory of Networking and Switching Technology, Beijing University of Posts and Telecommunications, Graduate School of Information Science and Technology, The University of Tokyo 备注：Accepted by ACL-IJCNLP 2021 main conference (long paper) 链接：https://arxiv.org/abs/2107.12578 摘要：对话状态跟踪（DST）的目标是在给定所有先前的对话上下文的情况下预测当前的对话状态。现有的方法通常从零开始预测每个回合的对话状态。但是，每个回合中绝大多数的槽都应该简单地继承上一回合的槽值。因此，在每一轮中平等对待时隙的机制不仅效率低下，而且可能由于产生冗余时隙值而导致额外的错误。为了解决这个问题，我们设计了两级DSS-DST，它由基于当前回合对话的双时隙选择器和基于对话历史的时隙值发生器组成。双时隙选择器从两个方面决定每个时隙是更新时隙值还是继承前一回合的时隙值：（1）它与当前回合对话话语之间是否存在强关系(2）如果通过当前匝数对话可以获得高可靠性的时隙值。选择要更新的插槽被允许进入插槽值生成器以通过混合方法更新值，而其他插槽直接继承上一轮的值。实验结果表明，该方法在multiwoz2.0、multiwoz2.1和multiwoz2.2数据集上的联合精度分别达到56.93%、60.73%和58.04%，取得了新的水平，并有显著的改进。摘要：The goal of dialogue state tracking (DST) is to predict the current dialogue state given all previous dialogue contexts. Existing approaches generally predict the dialogue state at every turn from scratch. However, the overwhelming majority of the slots in each turn should simply inherit the slot values from the previous turn. Therefore, the mechanism of treating slots equally in each turn not only is inefficient but also may lead to additional errors because of the redundant slot value generation. To address this problem, we devise the two-stage DSS-DST which consists of the Dual Slot Selector based on the current turn dialogue, and the Slot Value Generator based on the dialogue history. The Dual Slot Selector determines each slot whether to update slot value or to inherit the slot value from the previous turn from two aspects: (1) if there is a strong relationship between it and the current turn dialogue utterances; (2) if a slot value with high reliability can be obtained for it through the current turn dialogue. The slots selected to be updated are permitted to enter the Slot Value Generator to update values by a hybrid method, while the other slots directly inherit the values from the previous turn. Empirical results show that our method achieves 56.93%, 60.73%, and 58.04% joint accuracy on MultiWOZ 2.0, MultiWOZ 2.1, and MultiWOZ 2.2 datasets respectively and achieves a new state-of-the-art performance with significant improvements.

【3】 Language Grounding with 3D Objects 标题：3D对象的语言基础

作者：Jesse Thomason,Mohit Shridhar,Yonatan Bisk,Chris Paxton,Luke Zettlemoyer 机构：University of Southern California, University of Washington, Carnegie Mellon University, NVIDIA 备注：this https URL 链接：https://arxiv.org/abs/2107.12514 摘要：对机器人看似简单的自然语言请求通常没有明确规定，例如“你能给我拿无线鼠标吗？”当查看架子上的鼠标时，从某些角度或位置可能看不到按钮的数量或电线的存在。候选小鼠的平面图像可能无法提供“无线”所需的鉴别信息。世界和其中的物体不是平面的图像，而是复杂的三维形状。如果人类根据物体的任何基本属性（如颜色、形状或纹理）请求物体，机器人应该进行必要的探索以完成任务。特别是，虽然在明确理解颜色和类别等视觉属性方面做出了大量的努力和进展，但在理解形状和轮廓的语言方面取得的进展相对较少。在这项工作中，我们介绍了一种新的推理任务，目标都是视觉和非视觉语言的三维物体。我们的新基准，ShapeNet注解引用表达式（SNARE），需要一个模型来选择两个对象中的哪一个被自然语言描述引用。我们介绍了几种基于剪辑的模型来区分物体，并证明了尽管视觉和语言联合建模的最新进展有助于机器人的语言理解，但这些模型在理解物体的三维本质（在操纵中起关键作用的属性）方面仍然较弱。特别是，我们发现在语言基础模型中添加视图估计可以提高SNARE和在机器人平台上识别语言中引用的对象的准确性。摘要：Seemingly simple natural language requests to a robot are generally underspecified, for example "Can you bring me the wireless mouse?" When viewing mice on the shelf, the number of buttons or presence of a wire may not be visible from certain angles or positions. Flat images of candidate mice may not provide the discriminative information needed for "wireless". The world, and objects in it, are not flat images but complex 3D shapes. If a human requests an object based on any of its basic properties, such as color, shape, or texture, robots should perform the necessary exploration to accomplish the task. In particular, while substantial effort and progress has been made on understanding explicitly visual attributes like color and category, comparatively little progress has been made on understanding language about shapes and contours. In this work, we introduce a novel reasoning task that targets both visual and non-visual language about 3D objects. Our new benchmark, ShapeNet Annotated with Referring Expressions (SNARE), requires a model to choose which of two objects is being referenced by a natural language description. We introduce several CLIP-based models for distinguishing objects and demonstrate that while recent advances in jointly modeling vision and language are useful for robotic language understanding, it is still the case that these models are weaker at understanding the 3D nature of objects -- properties which play a key role in manipulation. In particular, we find that adding view estimation to language grounding models improves accuracy on both SNARE and when identifying objects referred to in language on a robot platform.

本文参与腾讯云自媒体同步曝光计划，分享自微信公众号。

原始发表：2021-07-28，如有侵权请联系 cloudcommunity@tencent.com 删除

linux