前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >自然语言处理学术速递[6.21]

自然语言处理学术速递[6.21]

作者头像
公众号-arXiv每日学术速递
发布2021-07-02 17:42:16
7620
发布2021-07-02 17:42:16
举报
文章被收录于专栏:arXiv每日学术速递

访问www.arxivdaily.com获取含摘要速递,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏、发帖等功能!点击阅读原文即可访问

cs.CL 方向,今日共计24篇

Transformer(2篇)

【1】 BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models 标题:BitFit:基于Transformer屏蔽语言模型的简单参数高效微调

作者:Elad Ben Zaken,Shauli Ravfogel,Yoav Goldberg 机构:Computer Science Department, Bar Ilan University, Allen Institute for Artificial Intelligence 链接:https://arxiv.org/abs/2106.10199 摘要:我们表明,对于中小型训练数据,仅微调预训练BERT模型的偏差项(或偏差项的子集)与微调整个模型具有竞争性(有时甚至优于微调)。对于较大的数据,只有偏差的微调方法与其他稀疏的微调方法是有竞争力的。除了实用性之外,这些发现还与理解常用的微调过程有关:它们支持这样的假设,即微调主要是暴露由语言建模训练诱导的知识,而不是学习新的任务特定的语言知识。 摘要:We show that with small-to-medium training data, fine-tuning only the bias terms (or a subset of the bias terms) of pre-trained BERT models is competitive with (and sometimes better than) fine-tuning the entire model. For larger data, bias-only fine-tuning is competitive with other sparse fine-tuning methods. Besides their practical utility, these findings are relevant for the question of understanding the commonly-used process of finetuning: they support the hypothesis that finetuning is mainly about exposing knowledge induced by language-modeling training, rather than learning new task-specific linguistic knowledge.

【2】 Multi-mode Transformer Transducer with Stochastic Future Context 标题:具有随机未来背景的多模Transformer互感器

作者:Kwangyoun Kim,Felix Wu,Prashant Sridhar,Kyu J. Han,Shinji Watanabe 机构:ASAPP, USA, Carnegie Mellon University, USA 备注:Accepted to Interspeech 2021 链接:https://arxiv.org/abs/2106.09760 摘要:自动语音识别(ASR)模型在将更多的周围语音信息表示为上下文时,错误率较低。不幸的是,获取更大的未来上下文会导致更高的延迟。在速度和准确性之间存在着不可避免的权衡。天真地说,为了适应不同的延迟需求,人们必须存储多个模型,并在约束条件下选择最佳模型。相反,更理想的方法是有一个单一的模型,可以根据不同的约束动态调整其延迟,我们称之为多模式ASR。多模式ASR模型可以满足推理过程中的各种延迟要求——当更大的延迟变得可接受时,模型可以处理更长的未来上下文以获得更高的精度;当延迟预算不灵活时,模型可以减少对未来上下文的依赖,但仍能获得可靠的精度。在追求多模式ASR,我们提出了随机未来上下文,一个简单的训练过程,样本流配置在每个迭代。通过在AISHELL-1和LibriSpeech数据集上的大量实验,我们证明了一个多模式ASR模型可以与一组经过不同延迟预算训练的具有竞争力的流基线相媲美。 摘要:Automatic speech recognition (ASR) models make fewer errors when more surrounding speech information is presented as context. Unfortunately, acquiring a larger future context leads to higher latency. There exists an inevitable trade-off between speed and accuracy. Naively, to fit different latency requirements, people have to store multiple models and pick the best one under the constraints. Instead, a more desirable approach is to have a single model that can dynamically adjust its latency based on different constraints, which we refer to as Multi-mode ASR. A Multi-mode ASR model can fulfill various latency requirements during inference -- when a larger latency becomes acceptable, the model can process longer future context to achieve higher accuracy and when a latency budget is not flexible, the model can be less dependent on future context but still achieve reliable accuracy. In pursuit of Multi-mode ASR, we propose Stochastic Future Context, a simple training procedure that samples one streaming configuration in each iteration. Through extensive experiments on AISHELL-1 and LibriSpeech datasets, we show that a Multi-mode ASR model rivals, if not surpasses, a set of competitive streaming baselines trained with different latency budgets.

BERT(1篇)

【1】 SPBERT: Pre-training BERT on SPARQL Queries for End-to-end Question Answering over Knowledge Graphs 标题:SPBERT:针对知识图上端到端问题回答的SPARQL查询对BERT进行预训练

作者:Hieu Tran,Long Phan,Truong-Son Nguyen 机构: University of Science, Ho Chi Minh City, Vietnam, Vietnam National Univeristy, Ho Chi Minh City, Vietnam, Case Western Reverse University, Ohio, USA 链接:https://arxiv.org/abs/2106.09997 摘要:我们的目标是建立一个基于知识图(KGs)的端到端问答系统(QA),它可以从自然语言问题构造SPARQL查询,并生成对其查询的口头化回答。因此,我们引入了SPBERT,这是一个基于转换器的语言模型,它是在大量SPARQL查询日志上预先训练的。通过结合掩蔽语言建模目标和词结构目标,SPBERT可以学习自然语言和SPARQL查询语言中的通用表示,并充分利用对SPARQL等结构化语言至关重要的词的顺序。在本文中,我们研究了SPBERT和编解码器架构如何适用于基于知识的QA语料库。我们在两个辅助任务上进行了详尽的实验,包括SPARQL查询构造和答案描述生成。结果表明,SPBERT在这些任务中取得了良好的性能和最新的成果。 摘要:We aim to create an unprecedented attempt to build an end-to-end Question Answering (QA) over Knowledge Graphs (KGs), which can construct SPARQL queries from natural language questions and generate a verbalized answer to its queries. Hence, we introduce SPBERT, a Transformer-based language model pre-trained on massive SPARQL query logs. By incorporating masked language modelling objective and word structural objective, SPBERT can learn general-purpose representations in both natural language and SPARQL query language and make the most of the sequential order of words that are crucial for structured language like SPARQL. In this paper, we investigate how SPBERT and encoder-decoder architecture can be adapted for Knowledge-based QA corpora. We conduct exhaustive experiments on two auxiliary tasks, including SPARQL Query Construction and Answer Verbalization Generation. Results show that SPBERT obtains promising performance and achieves state-of-the-art results on several of these tasks.

QA|VQA|问答|对话(1篇)

【1】 Continuity of Topic, Interaction, and Query: Learning to Quote in Online Conversations 标题:主题、交互和查询的连续性:学习在线对话中的引用

作者:Lingzhi Wang,Jing Li,Xingshan Zeng,Haisong Zhang,Kam-Fai Wong 机构:The Chinese University of Hong Kong, Hong Kong, China, MoE Key Laboratory of High Confidence Software Technologies, China, Department of Computing, The Hong Kong Polytechnic University, Hong Kong, China, Tencent AI Lab, China 备注:Accepted by EMNLP 2020, updated with dataset link 链接:https://arxiv.org/abs/2106.09896 摘要:在人际交往中,语录对于成功的解释和说服至关重要。然而,无论是对人类还是对机器来说,在对话中找到要引用的内容都是一项挑战。本文研究了在线会话中引语的自动生成,并探讨了语言的一致性如何影响引语是否符合给定的语境。在这里,我们从潜在话题、与对话历史的互动以及与提问轮现有内容的连贯性等方面来捕捉引用语的上下文一致性。此外,采用编码器-解码器神经框架通过语言生成继续引用上下文。在中英文两个大规模数据集上的实验结果表明,我们的引语生成模型优于现有的引语生成模型。进一步的分析表明,主题、交互和查询一致性都有助于学习如何在在线会话中引用。 摘要:Quotations are crucial for successful explanations and persuasions in interpersonal communications. However, finding what to quote in a conversation is challenging for both humans and machines. This work studies automatic quotation generation in an online conversation and explores how language consistency affects whether a quotation fits the given context. Here, we capture the contextual consistency of a quotation in terms of latent topics, interactions with the dialogue history, and coherence to the query turn's existing content. Further, an encoder-decoder neural framework is employed to continue the context with a quotation via language generation. Experiment results on two large-scale datasets in English and Chinese demonstrate that our quotation generation model outperforms the state-of-the-art models. Further analysis shows that topic, interaction, and query consistency are all helpful to learn how to quote in online conversations.

机器翻译(1篇)

【1】 Recurrent Stacking of Layers in Neural Networks: An Application to Neural Machine Translation 标题:神经网络中层的递归堆叠:在神经机器翻译中的应用

作者:Raj Dabre,Atsushi Fujita 机构:National Institute of Information and Communications Technology, -, Hikaridai, Seika-cho, Soraku-gun, Kyoto,-, Japan 备注:22 pages. Under review. Work in progress. Extended version of this https URL which is an extension of arXiv:1807.05353 . The focus is on analyzing the limitations of recurrently stacked layers and methods to overcome said limitations 链接:https://arxiv.org/abs/2106.10002 摘要:在深度神经网络建模中,最常见的做法是叠加一些递归、卷积或前馈层,以获得高质量的连续空间表示,进而提高网络预测的质量。通常,堆栈中的每一层都有自己的参数,这导致模型参数的数量显著增加。在本文中,我们建议在所有层之间共享参数,从而形成一个循环叠加的神经网络模型。我们报告了一个关于神经机器翻译(NMT)的广泛案例研究,其中我们将我们提出的方法应用于基于编码器-解码器的神经网络模型,即Transformer模型,并用三个日英翻译数据集进行了实验。我们的经验证明,一个模型的翻译质量,反复堆叠一层6次,尽管有显着较少的参数,接近的模型,堆叠6层,其中每一层有不同的参数。我们还探索了循环叠加的极限,在这里我们训练了非常深的NMT模型。本文还通过利用预先训练好的参数和知识提炼,检验了我们的循环叠加模型作为学生模型通过迁移学习的效用,并表明它补偿了直接训练循环叠加模型所带来的翻译质量的下降。我们还展示了转移学习如何帮助在已经减少的参数数量上更快地解码,这是由于循环叠加。最后,我们通过可视化的方式,分析了使用重复层叠的模型和不使用重复层叠的模型的注意事项,分析了重复层叠的效果。 摘要:In deep neural network modeling, the most common practice is to stack a number of recurrent, convolutional, or feed-forward layers in order to obtain high-quality continuous space representations which in turn improves the quality of the network's prediction. Conventionally, each layer in the stack has its own parameters which leads to a significant increase in the number of model parameters. In this paper, we propose to share parameters across all layers thereby leading to a recurrently stacked neural network model. We report on an extensive case study on neural machine translation (NMT), where we apply our proposed method to an encoder-decoder based neural network model, i.e., the Transformer model, and experiment with three Japanese--English translation datasets. We empirically demonstrate that the translation quality of a model that recurrently stacks a single layer 6 times, despite having significantly fewer parameters, approaches that of a model that stacks 6 layers where each layer has different parameters. We also explore the limits of recurrent stacking where we train extremely deep NMT models. This paper also examines the utility of our recurrently stacked model as a student model through transfer learning via leveraging pre-trained parameters and knowledge distillation, and shows that it compensates for the performance drops in translation quality that the direct training of recurrently stacked model brings. We also show how transfer learning helps in faster decoding on top of the already reduced number of parameters due to recurrent stacking. Finally, we analyze the effects of recurrently stacked layers by visualizing the attentions of models that use recurrently stacked layers and models that do not.

语义分析(1篇)

【1】 Enhancing user creativity: Semantic measures for idea generation 标题:增强用户创造力:创意生成的语义度量

作者:Georgi V. Georgiev,Danko D. Georgiev 机构: Center for Ubiquitous Computing, University of Oulu, Oulu, Finland, Institute for Advanced Study, Varna, Bulgaria,  Corresponding author:, P.O. Box , Varna , Bulgaria, Tel: + 备注:None 链接:https://arxiv.org/abs/2106.10131 摘要:人类的创造力产生新的想法来解决现实世界的问题。因此,这赋予我们改变周围世界的力量,并将我们的人类属性扩展到目前可能的范围之外。创造性的想法不仅是新的和意想不到的,而且在提供有用的、有效的和有价值的解决方案方面也是成功的。因此,创造力优化了可用资源的使用,增加了财富。然而,人们对人类创造力的起源却知之甚少,目前尚不清楚可以预测创意成功与否的语义指标。本文以wordnet3.1为基础,采用49种语义度量方法,对现实环境下的设计问题解决会话数据集进行了分析,结果表明,语义相似度的差异、信息量的增加和多义现象的减少预示着创意的成功。来自客户的第一次反馈也增强了信息内容,并导致在创造性问题解决方面成功想法的分歧。这些结果通过识别人类问题解决过程中与所产生的解决方案的成功相关的真实世界过程来推进认知科学,并为实时监控问题解决、学生训练和技能获取提供工具。信息内容(IC'S'anchez-Batet)和语义相似性(Lin/S'anchez-Batet)度量的选定子集,它们在统计上强大,计算速度快,可以支持开发计算机辅助增强人类创造力的技术,或支持在具有一般人工智能的机器上实现创造力的技术。 摘要:Human creativity generates novel ideas to solve real-world problems. This thereby grants us the power to transform the surrounding world and extend our human attributes beyond what is currently possible. Creative ideas are not just new and unexpected, but are also successful in providing solutions that are useful, efficient and valuable. Thus, creativity optimizes the use of available resources and increases wealth. The origin of human creativity, however, is poorly understood, and semantic measures that could predict the success of generated ideas are currently unknown. Here, we analyze a dataset of design problem-solving conversations in real-world settings by using 49 semantic measures based on WordNet 3.1 and demonstrate that a divergence of semantic similarity, an increased information content, and a decreased polysemy predict the success of generated ideas. The first feedback from clients also enhances information content and leads to a divergence of successful ideas in creative problem solving. These results advance cognitive science by identifying real-world processes in human problem solving that are relevant to the success of produced solutions and provide tools for real-time monitoring of problem solving, student training and skill acquisition. A selected subset of information content (IC S\'anchez-Batet) and semantic similarity (Lin/S\'anchez-Batet) measures, which are both statistically powerful and computationally fast, could support the development of technologies for computer-assisted enhancements of human creativity or for the implementation of creativity in machines endowed with general artificial intelligence.

Graph|知识图谱|Knowledge(3篇)

【1】 Graph-based Joint Pandemic Concern and Relation Extraction on Twitter 标题:推特上基于图的联合流行关注点及其关系提取

作者:Jingli Shi,Weihua Li,Sira Yongchareon,Yi Yang,Quan Bai 机构:Engineering, Computer and Mathematical Sciences, Auckland University of Technology, Auckland, New Zealand, Applied Artificial Intelligence Institute, Deakin University, Victoria, Australia, Information and Communication Technology, University of Tasmania 链接:https://arxiv.org/abs/2106.09929 摘要:在大流行爆发之前或期间,公共问题检测为当局的危机管理提供了潜在的指导。从网络社交媒体平台上发现人们的关注和关注,已被广泛认为是缓解公众恐慌、防止社会危机的有效途径。然而,从社交媒体的海量信息中及时发现关注点是一个巨大的挑战,尤其是在没有公共卫生突发事件(如COVID-19)的情况下,有足够的人工标记数据,提出了一种基于图卷积网络和双向长短时记忆与关注图相结合的端到端深度学习模型,用于识别人们的关注点及其对应关系。除了BERT嵌入的序列特征外,关注图模块还可以提取tweet的区域特征,这不仅有利于关注点的检测,而且使我们的模型具有很高的抗噪声能力。因此,我们的模型可以解决手动标记数据不足的问题。我们使用手动标记的tweets和自动标记的tweets进行了大量的实验来评估该模型。实验结果表明,我们的模型在真实数据集上的性能优于现有的模型。 摘要:Public concern detection provides potential guidance to the authorities for crisis management before or during a pandemic outbreak. Detecting people's concerns and attention from online social media platforms has been widely acknowledged as an effective approach to relieve public panic and prevent a social crisis. However, detecting concerns in time from massive information in social media turns out to be a big challenge, especially when sufficient manually labeled data is in the absence of public health emergencies, e.g., COVID-19. In this paper, we propose a novel end-to-end deep learning model to identify people's concerns and the corresponding relations based on Graph Convolutional Network and Bi-directional Long Short Term Memory integrated with Concern Graph. Except for the sequential features from BERT embeddings, the regional features of tweets can be extracted by the Concern Graph module, which not only benefits the concern detection but also enables our model to be high noise-tolerant. Thus, our model can address the issue of insufficient manually labeled data. We conduct extensive experiments to evaluate the proposed model by using both manually labeled tweets and automatically labeled tweets. The experimental results show that our model can outperform the state-of-art models on real-world datasets.

【2】 A Neural Edge-Editing Approach for Document-Level Relation Graph Extraction 标题:一种用于文档级关系图提取的神经边编辑方法

作者:Kohei Makino,Makoto Miwa,Yutaka Sasaki 机构:Toyota Technological Institute, -,-, Hisakata, Tempaku-ku, Nagoya,-, Japan 备注:Accepted for publication at the Findings of the Association for Computational Linguistics (Findings-ACL2021), 2021. 10 pages, 6 figures, 8 tables 链接:https://arxiv.org/abs/2106.09900 摘要:在本文中,我们提出了一种新的边缘编辑方法来从文档中提取关系信息。在这种方法中,我们将文档中的关系视为实体之间的关系图。关系图是通过编辑初始图的边来迭代构造的,初始图可以是另一个系统提取的图,也可以是空图。编辑边的方法是使用文档和临时构造的图形信息,以一种接近优先的方式对边进行分类;每一条边用一个预训练的变换器模型表示一个文档上下文信息,用一个图卷积神经网络模型表示一个图上下文信息。我们评估了从材料科学文本中提取材料合成程序的方法。实验结果表明,该方法可以有效地编辑内部规则系统初始化的图和空图。 摘要:In this paper, we propose a novel edge-editing approach to extract relation information from a document. We treat the relations in a document as a relation graph among entities in this approach. The relation graph is iteratively constructed by editing edges of an initial graph, which might be a graph extracted by another system or an empty graph. The way to edit edges is to classify them in a close-first manner using the document and temporally-constructed graph information; each edge is represented with a document context information by a pretrained transformer model and a graph context information by a graph convolutional neural network model. We evaluate our approach on the task to extract material synthesis procedures from materials science texts. The experimental results show the effectiveness of our approach in editing the graphs initialized by our in-house rule-based system and empty graphs.

【3】 Multi-Task Learning and Adapted Knowledge Models for Emotion-Cause Extraction 标题:用于情感原因提取的多任务学习和自适应知识模型

作者:Elsbeth Turcan,Shuai Wang,Rishita Anubhai,Kasturi Bhattacharjee,Yaser Al-Onaizan,Smaranda Muresan 机构:Department of Computer Science, Columbia University, Data Science Institute, Columbia University, Amazon AI 备注:15 pages, 6 figures. Findings of ACL 2021 链接:https://arxiv.org/abs/2106.09790 摘要:在自然语言处理中,检测文本中表达的情感是一个被广泛研究的问题。然而,对更细粒度的情感分析的研究,如情感的成因,仍处于起步阶段。我们提出的解决方案,处理两个情感识别和情感原因检测在一个联合的方式。考虑到常识知识对理解隐含表达的情感及其原因起着重要的作用,我们提出了一种新的方法,通过自适应知识模型将常识知识与多任务学习相结合,进行情感分类和情感原因标注。当包含常识推理和多任务框架时,这两个任务的性能都得到了提高。我们提供了一个彻底的分析,以获得对模型性能的见解。 摘要:Detecting what emotions are expressed in text is a well-studied problem in natural language processing. However, research on finer grained emotion analysis such as what causes an emotion is still in its infancy. We present solutions that tackle both emotion recognition and emotion cause detection in a joint fashion. Considering that common-sense knowledge plays an important role in understanding implicitly expressed emotions and the reasons for those emotions, we propose novel methods that combine common-sense knowledge via adapted knowledge models with multi-task learning to perform joint emotion classification and emotion cause tagging. We show performance improvement on both tasks when including common-sense reasoning and a multitask framework. We provide a thorough analysis to gain insights into model performance.

摘要|信息提取(1篇)

【1】 Subjective Bias in Abstractive Summarization 标题:摘要摘要中的主观偏见

作者:Lei Li,Wei Liu,Marina Litvak,Natalia Vanetik,Jiacheng Pei,Yinan Liu,Siya Qi 机构:Beijing University of Posts and Telecommunications, Shamoon College of Engineering 备注:10 pages, 7 figures, 4 tables 链接:https://arxiv.org/abs/2106.10084 摘要:由于总结的主观性,每个训练文档都有一个以上的黄金总结是一种很好的做法。然而,许多现代大型摘要数据集只有一对一的样本是由不同风格的人编写的。这一现象的影响尚待研究。我们将总结同一内容的可能的多个表达方式之间的差异表述为主观偏见,并考察这种偏见在抽象摘要中的作用。提出了一种轻量级的、有效的主观风格特征嵌入提取方法。在样式聚类数据集上训练的摘要模型的结果表明,存在某些类型的样式,这些样式具有更好的收敛性、抽象性和泛化能力。可复制的代码和生成的摘要可在线获取。 摘要:Due to the subjectivity of the summarization, it is a good practice to have more than one gold summary for each training document. However, many modern large-scale abstractive summarization datasets have only one-to-one samples written by different human with different styles. The impact of this phenomenon is understudied. We formulate the differences among possible multiple expressions summarizing the same content as subjective bias and examine the role of this bias in the context of abstractive summarization. In this paper a lightweight and effective method to extract the feature embeddings of subjective styles is proposed. Results of summarization models trained on style-clustered datasets show that there are certain types of styles that lead to better convergence, abstraction and generalization. The reproducible code and generated summaries are available online.

推理|分析|理解|解释(1篇)

【1】 Towards Financial Sentiment Analysis in a South African Landscape 标题:南非景观中的金融情绪分析

作者:Michelle Terblanche,Vukosi Marivate 机构:Marivate[,−,−,−,], Department of Computer Science, University of Pretoria, South Africa 备注:Accepted for publication in Proceedings of CD-MAKE 2021 Conference 链接:https://arxiv.org/abs/2106.10004 摘要:情感分析作为自然语言处理的一个子领域,在过去的十年中受到了越来越多的关注,使组织能够通过在线媒体监控更有效地管理自己的声誉。许多驱动因素影响声誉,然而,本文只关注财务绩效方面,并探讨了南非背景下财务情绪分析方面的差距。结果表明,预先训练好的情绪分析工具对这项任务的效果最差,传统的基于词典和机器学习的方法最适合预测新闻文章的财务情绪。评价方法的准确度为84%~94%。预测的情绪与股价有很好的相关性,并突出了情绪作为财务业绩指标的潜在用途。这项研究的一个主要贡献是更新了现有的金融情绪分析的情绪词典。由于所使用的训练数据有限,模型的泛化不太被接受。未来的工作包括扩展数据集以提高通用性,并为南非数据的开源金融情绪分析器做出贡献。 摘要:Sentiment analysis as a sub-field of natural language processing has received increased attention in the past decade enabling organisations to more effectively manage their reputation through online media monitoring. Many drivers impact reputation, however, this thesis focuses only the aspect of financial performance and explores the gap with regards to financial sentiment analysis in a South African context. Results showed that pre-trained sentiment analysers are least effective for this task and that traditional lexicon-based and machine learning approaches are best suited to predict financial sentiment of news articles. The evaluated methods produced accuracies of 84\%-94\%. The predicted sentiments correlated quite well with share price and highlighted the potential use of sentiment as an indicator of financial performance. A main contribution of the study was updating an existing sentiment dictionary for financial sentiment analysis. Model generalisation was less acceptable due to the limited amount of training data used. Future work includes expanding the data set to improve general usability and contribute to an open-source financial sentiment analyser for South African data.

半/弱/无监督|不确定性(1篇)

【1】 Weakly Supervised Pre-Training for Multi-Hop Retriever 标题:多跳搜索器的弱监督预训练

作者:Yeon Seonwoo,Sang-Woo Lee,Ji-Hoon Kim,Jung-Woo Ha,Alice Oh 机构:†KAIST, ‡NAVER AI Lab, §NAVER Clova 备注:ACL-Findings 2021 链接:https://arxiv.org/abs/2106.09983 摘要:在多跳QA中,回答复杂的问题需要迭代的文档检索来寻找问题的缺失实体。这个过程的主要步骤是子问题检测、子问题的文档检索以及为最终的文档检索生成新的查询。然而,构建一个包含带有子问题的复杂问题及其相应文档的数据集需要昂贵的人工注释。针对这一问题,提出了一种无需人工干预的弱监督多跳检索器预训练方法。该方法包括:1)生成复杂问题向量表示的预训练任务;2)生成问题和子问题嵌套结构的可伸缩数据生成方法;3)基于密集编码器的预训练模型结构。在端到端多跳QA和文档检索方面,我们进行了实验来比较我们预先训练好的检索器与几种最先进的模型的性能。实验结果表明,我们的预训练检索器是有效的,并且对有限的数据和计算资源具有鲁棒性。 摘要:In multi-hop QA, answering complex questions entails iterative document retrieval for finding the missing entity of the question. The main steps of this process are sub-question detection, document retrieval for the sub-question, and generation of a new query for the final document retrieval. However, building a dataset that contains complex questions with sub-questions and their corresponding documents requires costly human annotation. To address the issue, we propose a new method for weakly supervised multi-hop retriever pre-training without human efforts. Our method includes 1) a pre-training task for generating vector representations of complex questions, 2) a scalable data generation method that produces the nested structure of question and sub-question as weak supervision for pre-training, and 3) a pre-training model structure based on dense encoders. We conduct experiments to compare the performance of our pre-trained retriever with several state-of-the-art models on end-to-end multi-hop QA as well as document retrieval. The experimental results show that our pre-trained retriever is effective and also robust on limited data and computational resources.

检测相关(1篇)

【1】 An Information Retrieval Approach to Building Datasets for Hate Speech Detection 标题:一种构建仇恨言语检测数据集的信息检索方法

作者:Md Mustafizur Rahman,Dinesh Balakrishnan,Dhiraj Murthy,Mucahid Kutlu,Matthew Lease 机构:School of Information, The University of Texas at Austin, USA, Department of Computer Science, School of Journalism and Media, Department of Computer Engineering, TOBB Economy and Tech. University, Turkey 备注:10 pages (Under review in CIKM 2021) 链接:https://arxiv.org/abs/2106.09775 摘要:为仇恨语音检测建立一个基准数据集带来了一些挑战。首先,由于仇恨言论相对较少,例如,不到3\%的Twitter帖子是仇恨的,所以随机抽取要注释的tweet来捕捉仇恨言论是低效的。一种常见的做法是只注释包含已知“仇恨词”的tweet,但这有可能产生一个偏颇的基准,只部分捕捉现实世界中的兴趣现象。第二个挑战是,仇恨言论的定义往往是高度可变和主观的。对仇恨言论有不同先前概念的注释者可能不仅彼此不同意,而且很难遵守特定的标签准则。我们的主要见解是,仇恨言论的稀有性和主观性与信息检索中的相关性相似。这种联系表明,用于创建IR测试集合的成熟方法也可以有效地应用于创建更好的仇恨语音检测基准数据集。首先,为了智能有效地选择要注释的tweet,我们应用了已有的IR技术{\em pooling}和{\em active learning}。其次,为了提高注释的一致性和价值,我们应用了{\em task decomposition}\cite{Zhang-sigir14}和{\em annotator reasonalize}\cite{mcdonnell16 hcomp}技术。使用上述技术,我们创建并共享一个新的基准数据集\footnote{我们将在发布时发布该数据集。}用于仇恨语音检测,其覆盖范围比以前的数据集更广。我们还显示,在对这些更广泛形式的仇恨进行测试时,现有检测模型的准确率急剧下降。收集的注释器原理不仅为标记决策提供了文档化的支持,而且为建模中的双重监督和/或解释生成创造了令人兴奋的未来工作机会。 摘要:Building a benchmark dataset for hate speech detection presents several challenges. Firstly, because hate speech is relatively rare -- e.g., less than 3\% of Twitter posts are hateful \citep{founta2018large} -- random sampling of tweets to annotate is inefficient in capturing hate speech. A common practice is to only annotate tweets containing known ``hate words'', but this risks yielding a biased benchmark that only partially captures the real-world phenomenon of interest. A second challenge is that definitions of hate speech tend to be highly variable and subjective. Annotators having diverse prior notions of hate speech may not only disagree with one another but also struggle to conform to specified labeling guidelines. Our key insight is that the rarity and subjectivity of hate speech are akin to that of relevance in information retrieval (IR). This connection suggests that well-established methodologies for creating IR test collections might also be usefully applied to create better benchmark datasets for hate speech detection. Firstly, to intelligently and efficiently select which tweets to annotate, we apply established IR techniques of {\em pooling} and {\em active learning}. Secondly, to improve both consistency and value of annotations, we apply {\em task decomposition} \cite{Zhang-sigir14} and {\em annotator rationale} \cite{mcdonnell16-hcomp} techniques. Using the above techniques, we create and share a new benchmark dataset\footnote{We will release the dataset upon publication.} for hate speech detection with broader coverage than prior datasets. We also show a dramatic drop in accuracy of existing detection models when tested on these broader forms of hate. Collected annotator rationales not only provide documented support for labeling decisions but also create exciting future work opportunities for dual-supervision and/or explanation generation in modeling.

识别/分类(3篇)

【1】 Fusion of Embeddings Networks for Robust Combination of Text Dependent and Independent Speaker Recognition 标题:嵌入网络融合稳健结合文本相关和独立说话人识别

作者:Ruirui Li,Chelsea J. -T. Ju,Zeya Chen,Hongda Mao,Oguz Elibol,Andreas Stolcke 机构:Chelsea J.-T. Ju, Amazon Alexa Speech 链接:https://arxiv.org/abs/2106.10169 摘要:通过基于用户的语音输入隐式地识别用户,说话人识别可以实现许多下游应用,例如个性化的系统行为和快速购物结账。基于语音内容是否受限,可以使用文本相关(TD)和文本无关(TI)说话人识别模型。我们希望通过一个集合系统将这两种模型的优点结合起来,做出更可靠的预测。然而,任何这样的组合方法必须对不完整的输入具有鲁棒性,即当TD或TI输入缺失时。作为一个解决方案,我们提出了一个融合嵌入式网络的foenet架构,结合联合学习和神经注意。在一个语音辅助输入数据集上,我们将foenet与四种有竞争力的基线融合方法进行了比较,结果表明,该方法比基线融合和分数融合方法具有更高的准确率,特别是在存在不完全输入的情况下。 摘要:By implicitly recognizing a user based on his/her speech input, speaker identification enables many downstream applications, such as personalized system behavior and expedited shopping checkouts. Based on whether the speech content is constrained or not, both text-dependent (TD) and text-independent (TI) speaker recognition models may be used. We wish to combine the advantages of both types of models through an ensemble system to make more reliable predictions. However, any such combined approach has to be robust to incomplete inputs, i.e., when either TD or TI input is missing. As a solution we propose a fusion of embeddings network foenet architecture, combining joint learning with neural attention. We compare foenet with four competitive baseline methods on a dataset of voice assistant inputs, and show that it achieves higher accuracy than the baseline and score fusion methods, especially in the presence of incomplete inputs.

【2】 Label Mask for Multi-Label Text Classification 标题:多标签文本分类中的标签掩码

作者:Rui Song,Xingbing Chen,Zelong Liu,Haining An,Zhiqi Zhang,Xiaoguang Wang,Hao Xu 机构:School of Artificial Intelligence, Jilin University, Public Computer Education and Research Center, Jilin University, College of Computer Science and Technology, Key Laboratory of Symbolic Computing and 链接:https://arxiv.org/abs/2106.10076 摘要:多标签文本分类的关键问题之一是如何利用标签之间的相关性。然而,在一个复杂且未知的标签空间中,直接对标签之间的相关性进行建模是一个非常具有挑战性的问题。本文借鉴语言模型中完形填空的思想,提出了一种标签掩码多标签文本分类模型(LM-MTC)。通过训练前语言模型的强大功能,LM-MTC能够捕捉到标签之间的隐含关系。在此基础上,我们为每个潜在的标签分配一个不同的标记,并以一定的概率随机屏蔽该标记,建立一个基于标签的屏蔽语言模型(MLM)。同时对MTC和MLM进行训练,进一步提高了模型的泛化能力。在多个数据集上的大量实验证明了该方法的有效性。 摘要:One of the key problems in multi-label text classification is how to take advantage of the correlation among labels. However, it is very challenging to directly model the correlations among labels in a complex and unknown label space. In this paper, we propose a Label Mask multi-label text classification model (LM-MTC), which is inspired by the idea of cloze questions of language model. LM-MTC is able to capture implicit relationships among labels through the powerful ability of pre-train language models. On the basis, we assign a different token to each potential label, and randomly mask the token with a certain probability to build a label based Masked Language Model (MLM). We train the MTC and MLM together, further improving the generalization ability of the model. A large number of experiments on multiple datasets demonstrate the effectiveness of our method.

【3】 On-Device Personalization of Automatic Speech Recognition Models for Disordered Speech 标题:无序语音的自动语音识别模型的在设备个性化

作者:Katrin Tomanek,Françoise Beaufays,Julie Cattiau,Angad Chandorkar,Khe Chai Sim 机构:Google, USA 链接:https://arxiv.org/abs/2106.10259 摘要:虽然目前最先进的自动语音识别(ASR)系统对典型语音的识别精度很高,但对无序语音和其他非典型语音模式的识别性能却有很大的下降。ASR模型的个人化是解决这一问题的常用方法,通常在基于服务器的训练环境中进行,这会带来数据隐私、模型更新时间延迟以及在移动设备和服务器基础设施之间复制数据和模型的通信成本等问题。在本文中,我们提出了一种基于设备的ASR个性化的方法,只需要非常少量的特定于说话人的数据。我们在一组100个语音混乱的人身上测试了我们的方法,发现平均相对单词错误率提高了71%,每个人只需要50个简短的话语。在语音控制的家庭自动化平台上进行测试时,设备上的个性化模型显示平均任务成功率为81%,而不适应的模型只有40%。 摘要:While current state-of-the-art Automatic Speech Recognition (ASR) systems achieve high accuracy on typical speech, they suffer from significant performance degradation on disordered speech and other atypical speech patterns. Personalization of ASR models, a commonly applied solution to this problem, is usually performed in a server-based training environment posing problems around data privacy, delayed model-update times, and communication cost for copying data and models between mobile device and server infrastructure. In this paper, we present an approach to on-device based ASR personalization with very small amounts of speaker-specific data. We test our approach on a diverse set of 100 speakers with disordered speech and find median relative word error rate improvement of 71% with only 50 short utterances required per speaker. When tested on a voice-controlled home automation platform, on-device personalized models show a median task success rate of 81%, compared to only 40% of the unadapted models.

表征(1篇)

【1】 Investigating the Role of Negatives in Contrastive Representation Learning 标题:否定在对比表征学习中的作用研究

作者:Jordan T. Ash,Surbhi Goel,Akshay Krishnamurthy,Dipendra Misra 机构:Microsoft Research NYC 链接:https://arxiv.org/abs/2106.09943 摘要:噪声对比学习是一种流行的无监督表征学习方法。在这种方法中,通过简化到监督学习来获得表示,在给定语义相似性概念的情况下,学习者试图从随机(负)示例集合中区分相似(正)示例。现代对比学习管道的成功依赖于许多参数,如数据扩充的选择、负面例子的数量和批量大小;然而,对于这些参数如何相互作用以及如何影响下游性能,人们的理解还很有限。我们专注于消除这些参数之一的作用:负面例子的数量。理论上,我们证明了冲突覆盖率权衡的存在,这表明最佳的负面例子数量应该与数据中潜在概念的数量成比例。从经验上看,我们仔细研究了NLP和视觉任务中消极因素的数量所起的作用。在NLP任务中,我们发现结果与我们的理论基本一致,而我们的视觉实验比较模糊,有时甚至对负片数不敏感。我们讨论了这种行为的合理解释,并建议未来的方向,以更好地调整理论和实践。 摘要:Noise contrastive learning is a popular technique for unsupervised representation learning. In this approach, a representation is obtained via reduction to supervised learning, where given a notion of semantic similarity, the learner tries to distinguish a similar (positive) example from a collection of random (negative) examples. The success of modern contrastive learning pipelines relies on many parameters such as the choice of data augmentation, the number of negative examples, and the batch size; however, there is limited understanding as to how these parameters interact and affect downstream performance. We focus on disambiguating the role of one of these parameters: the number of negative examples. Theoretically, we show the existence of a collision-coverage trade-off suggesting that the optimal number of negative examples should scale with the number of underlying concepts in the data. Empirically, we scrutinize the role of the number of negatives in both NLP and vision tasks. In the NLP task, we find that the results broadly agree with our theory, while our vision experiments are murkier with performance sometimes even being insensitive to the number of negatives. We discuss plausible explanations for this behavior and suggest future directions to better align theory and practice.

Word2Vec|文本|单词(2篇)

【1】 Challenges and Limitations with the Metrics Measuring the Complexity of Code-Mixed Text 标题:度量代码混合文本复杂性的挑战和限制

作者:Vivek Srivastava,Mayank Singh 机构:TCS Research, Pune, Maharashtra, India, IIT Gandhinagar, Gandhinagar, Gujarat, India 链接:https://arxiv.org/abs/2106.10123 摘要:语码混合是多语使用者之间一种常见的交流方式,他们将两种不同语言的单词和短语混合在同一个文本或语音中。由于混码文本与单语文本和噪声文本共存,识别和过滤混码文本是一项具有挑战性的任务。多年来,一些代码混合度量被广泛用于识别和验证代码混合文本质量。本文以大量实验中广泛使用的现有数据集为例,说明了代码混合度量的一些固有局限性。 摘要:Code-mixing is a frequent communication style among multilingual speakers where they mix words and phrases from two different languages in the same utterance of text or speech. Identifying and filtering code-mixed text is a challenging task due to its co-existence with monolingual and noisy text. Over the years, several code-mixing metrics have been extensively used to identify and validate code-mixed text quality. This paper demonstrates several inherent limitations of code-mixing metrics with examples from the already existing datasets that are popularly used across various experiments.

【2】 LNN-EL: A Neuro-Symbolic Approach to Short-text Entity Linking 标题:LNN-EL:一种短文本实体链接的神经符号方法

作者:Hang Jiang,Sairam Gurajada,Qiuhao Lu,Sumit Neelam,Lucian Popa,Prithviraj Sen,Yunyao Li,Alexander Gray 机构:MIT, IBM Research, University of Oregon 备注:Accepted to ACL 2021 链接:https://arxiv.org/abs/2106.09795 摘要:实体链接(elentitylinking,EL)是通过将文本中提到的内容链接到知识图中的实体来消除歧义的任务,对于文本理解、问答或会话系统至关重要。由于上下文的限制,在短文本(例如,单个句子或问题)上的实体链接带来了特殊的挑战。虽然先前的方法使用启发式或黑盒神经方法,这里我们提出LNN-EL,一种神经符号方法,它结合了使用基于一阶逻辑的可解释规则的优点和神经学习的性能。即使受限于使用规则,LNN-EL也能与SotA黑盒神经方法进行竞争,具有可扩展性和可转移性的额外优势。特别地,我们证明了我们可以很容易地将现有的由专家给出的规则模板与多种类型的特征(先验、BERT编码、盒嵌入等)以及以前EL方法得到的分数进行融合,从而改进了这些方法。例如,在LC-QuAD-1.0数据集上,我们显示F1分数比之前的SotA增加了4$\%。最后,我们证明了使用逻辑所提供的归纳偏差会导致学习的规则在数据集之间传输良好,即使没有微调,同时保持高精度。 摘要:Entity linking (EL), the task of disambiguating mentions in text by linking them to entities in a knowledge graph, is crucial for text understanding, question answering or conversational systems. Entity linking on short text (e.g., single sentence or question) poses particular challenges due to limited context. While prior approaches use either heuristics or black-box neural methods, here we propose LNN-EL, a neuro-symbolic approach that combines the advantages of using interpretable rules based on first-order logic with the performance of neural learning. Even though constrained to using rules, LNN-EL performs competitively against SotA black-box neural approaches, with the added benefits of extensibility and transferability. In particular, we show that we can easily blend existing rule templates given by a human expert, with multiple types of features (priors, BERT encodings, box embeddings, etc), and even scores resulting from previous EL methods, thus improving on such methods. For instance, on the LC-QuAD-1.0 dataset, we show more than $4$\% increase in F1 score over previous SotA. Finally, we show that the inductive bias offered by using logic results in learned rules that transfer well across datasets, even without fine tuning, while maintaining high accuracy.

其他神经网络|深度学习|模型|建模(1篇)

【1】 Predicting gender of Brazilian names using deep learning 标题:基于深度学习的巴西人名性别预测

作者:Rosana C. B. Rego,Verônica M. L. Silva 机构:Program in Electrical and Computer Engineering, Federal University of Rio Grande do Norte, Brazil, Department of Engineering and Technology, Federal Rural University of Semi-Arid, Brazil 备注:9 pages, 8 figures 链接:https://arxiv.org/abs/2106.10156 摘要:通过名字来预测性别并不是一项简单的任务。在许多应用程序中,特别是在自然语言处理(NLP)领域,这项任务可能是必要的,主要是在考虑外来名称时。一些机器学习算法可以很好地进行预测。在本文中,我们研究并实现了前馈和递归的深层神经网络模型,如MLP、RNN、GRU、CNN和BiLSTM,通过名字来分类性别。巴西人名数据集用于训练和评估模型。我们分析了模型的准确度、召回率、精确度和混淆矩阵来衡量模型的性能。结果表明,通过将人名看作一组字符串的特征提取策略,可以进行性别预测。一些模型准确预测了90%以上病例的性别。递归模型克服了二值分类问题中的前馈模型。 摘要:Predicting gender by the name is not a simple task. In many applications, especially in the natural language processing (NLP) field, this task may be necessary, mainly when considering foreign names. Some machine learning algorithms can satisfactorily perform the prediction. In this paper, we examined and implemented feedforward and recurrent deep neural network models, such as MLP, RNN, GRU, CNN, and BiLSTM, to classify gender through the first name. A dataset of Brazilian names is used to train and evaluate the models. We analyzed the accuracy, recall, precision, and confusion matrix to measure the models' performances. The results indicate that the gender prediction can be performed from the feature extraction strategy looking at the names as a set of strings. Some models accurately predict the gender in more than 90% of the cases. The recurrent models overcome the feedforward models in this binary classification problem.

其他(4篇)

【1】 Synchronising speech segments with musical beats in Mandarin and English singing 标题:普通话和英语演唱中语音片段与音乐节拍的同步

作者:Cong Zhang,Jian Zhu 机构:Radboud University, University of Michigan 备注:To be published in the Proceeding of Interspeech 2021 链接:https://arxiv.org/abs/2106.10045 摘要:基于语音数据训练的模型具有很强的灵活性和可控性,能够生成合成的人声。然而,由于在语音训练数据中缺乏关于片段和节拍之间的时间关系的信息,因此合成的歌唱有时可能听起来不符合节拍。因此,语音片段和音乐节拍之间的时间关系信息的可用性是至关重要的。本文以P-中心理论和声音层次理论为基础,对歌唱数据中的节拍同步问题进行了研究。对一个普通话语料库和一个英语专业歌唱语料库进行了人工标注和分析。结果表明,音乐节拍的存在更多地依赖于节段的持续时间而不是声音。然而,声音等级和P-中心理论与搏动的位置高度相关。尽管汉语和英语表现出共同的模式,但它们表现出跨语言的差异。 摘要:Generating synthesised singing voice with models trained on speech data has many advantages due to the models' flexibility and controllability. However, since the information about the temporal relationship between segments and beats are lacking in speech training data, the synthesised singing may sound off-beat at times. Therefore, the availability of the information on the temporal relationship between speech segments and music beats is crucial. The current study investigated the segment-beat synchronisation in singing data, with hypotheses formed based on the linguistics theories of P-centre and sonority hierarchy. A Mandarin corpus and an English corpus of professional singing data were manually annotated and analysed. The results showed that the presence of musical beats was more dependent on segment duration than sonority. However, the sonority hierarchy and the P-centre theory were highly related to the location of beats. Mandarin and English demonstrated cross-linguistic variations despite exhibiting common patterns.

【2】 Bad Characters: Imperceptible NLP Attacks 标题:坏人:潜移默化的NLP攻击

作者:Nicholas Boucher,Ilia Shumailov,Ross Anderson,Nicolas Papernot 机构:University of Cambridge, Cambridge, United Kingdom, Vector Institute, University of Toronto, University of Edinburgh, Toronto, Canada 链接:https://arxiv.org/abs/2106.09898 摘要:多年的研究表明,无论是在理论上还是在实践中,机器学习系统都很容易受到对抗性例子的攻击。到目前为止,此类攻击主要针对视觉模型,利用人类和机器感知之间的差距。尽管基于文本的模型也受到了对抗性示例的攻击,但这种攻击很难保持语义和不可区分性。在本文中,我们探讨了一大类敌对的例子,可以用来攻击文本为基础的模型在一个黑盒子的设置,而不作出任何人类可感知的视觉修改的输入。我们使用人眼无法察觉的特定编码扰动来操纵从神经机器翻译管道到网络搜索引擎的各种自然语言处理(NLP)系统的输出。我们发现,通过一次不可察觉的编码注入(表示一个不可见字符、同形符、重新排序或删除),攻击者可以显著降低易受攻击模型的性能,而通过三次注入,大多数模型都可能在功能上被破坏。我们的攻击针对当前部署的商业系统,包括微软和谷歌生产的系统,以及Facebook和IBM发布的开源模型。这一系列新的攻击对许多语言处理系统构成了重大威胁:攻击者可以有针对性地影响系统,而无需对底层模型进行任何假设。我们的结论是,基于文本的NLP系统需要仔细的输入净化,就像传统的应用程序一样,并且考虑到这种系统现在正在大规模快速部署,迫切需要架构师和操作员的关注。 摘要:Several years of research have shown that machine-learning systems are vulnerable to adversarial examples, both in theory and in practice. Until now, such attacks have primarily targeted visual models, exploiting the gap between human and machine perception. Although text-based models have also been attacked with adversarial examples, such attacks struggled to preserve semantic meaning and indistinguishability. In this paper, we explore a large class of adversarial examples that can be used to attack text-based models in a black-box setting without making any human-perceptible visual modification to inputs. We use encoding-specific perturbations that are imperceptible to the human eye to manipulate the outputs of a wide range of Natural Language Processing (NLP) systems from neural machine-translation pipelines to web search engines. We find that with a single imperceptible encoding injection -- representing one invisible character, homoglyph, reordering, or deletion -- an attacker can significantly reduce the performance of vulnerable models, and with three injections most models can be functionally broken. Our attacks work against currently-deployed commercial systems, including those produced by Microsoft and Google, in addition to open source models published by Facebook and IBM. This novel series of attacks presents a significant threat to many language processing systems: an attacker can affect systems in a targeted manner without any assumptions about the underlying model. We conclude that text-based NLP systems require careful input sanitization, just like conventional applications, and that given such systems are now being deployed rapidly at scale, the urgent attention of architects and operators is required.

【3】 PRGC: Potential Relation and Global Correspondence Based Joint Relational Triple Extraction 标题:PRGC:基于潜在关系和全局对应的联合关系三元组抽取

作者:Hengyi Zheng,Rui Wen,Xi Chen,Yifan Yang,Yunyan Zhang,Ziheng Zhang,Ningyu Zhang,Bin Qin,Ming Xu,Yefeng Zheng 机构:College of Electronics and Information Engineering, Shenzhen University, Information Technology Center, Shenzhen University, Tencent Jarvis Lab, Shenzhen, China, Platform and Content Group, Tencent ,Zhejiang University 备注:Accepted by ACL 2021 链接:https://arxiv.org/abs/2106.09895 摘要:从非结构化文本中联合提取实体和关系是信息抽取中的一项重要任务。目前的方法虽然取得了相当好的效果,但仍存在一些固有的局限性,如关系预测的冗余性、基于广度的抽取泛化能力差、效率低下等。本文从一个新的角度将该任务分解为关系判断、实体提取和主客体对齐三个子任务,提出了一种基于潜在关系和全局对应的联合关系三重提取框架(PRGC)。具体来说,我们设计了一个预测潜在关系的组件,该组件将以下实体提取限制在预测的关系子集而不是所有关系中;然后利用特定于关系的序列标记组件来处理主客体之间的重叠问题;最后,设计了一个全局对应组件,将主客体对齐成一个低复杂度的三元组。大量的实验表明,PRGC在公共基准上以更高的效率实现了最先进的性能,并且在复杂的重叠三元组场景中提供了一致的性能增益。 摘要:Joint extraction of entities and relations from unstructured texts is a crucial task in information extraction. Recent methods achieve considerable performance but still suffer from some inherent limitations, such as redundancy of relation prediction, poor generalization of span-based extraction and inefficiency. In this paper, we decompose this task into three subtasks, Relation Judgement, Entity Extraction and Subject-object Alignment from a novel perspective and then propose a joint relational triple extraction framework based on Potential Relation and Global Correspondence (PRGC). Specifically, we design a component to predict potential relations, which constrains the following entity extraction to the predicted relation subset rather than all relations; then a relation-specific sequence tagging component is applied to handle the overlapping problem between subjects and objects; finally, a global correspondence component is designed to align the subject and object into a triple with low-complexity. Extensive experiments show that PRGC achieves state-of-the-art performance on public benchmarks with higher efficiency and delivers consistent performance gain on complex scenarios of overlapping triples.

【4】 GEM: A General Evaluation Benchmark for Multimodal Tasks 标题:GEM:一种通用的多通道任务评价基准

作者:Lin Su,Nan Duan,Edward Cui,Lei Ji,Chenfei Wu,Huaishao Luo,Yongfei Liu,Ming Zhong,Taroon Bharti,Arun Sacheti 机构: Bing Multimedia Team, Microsoft, China, Natural Language Computing, Microsoft Research Asia, China, Southwest Jiaotong University, China , ShanghaiTech University, China, Bing Multimedia Team, Microsoft, United States 备注:Accepted by Findings of ACL 2021 链接:https://arxiv.org/abs/2106.09889 摘要:在本文中,我们提出创业板作为一个通用的评估基准多模态任务。不同于现有的GLUE、SuperGLUE、XGLUE和XTREME等主要关注自然语言任务的数据集,GEM是一个大规模的视觉语言基准,它由GEM-I和GEM-V组成,分别用于图像语言任务和视频语言任务。与现有的图像语言任务的MSCOCO、Flicker30K、视频语言任务的YouCook2、MSR-VTT等多模态数据集相比,GEM不仅是同时覆盖图像语言任务和视频语言任务的最大的视觉语言数据集,而且具有多种语言的标记。我们还为此基准提供了两个基准模型。我们将发布数据集、代码和基线模型,旨在推动多语种多模态研究的发展。 摘要:In this paper, we present GEM as a General Evaluation benchmark for Multimodal tasks. Different from existing datasets such as GLUE, SuperGLUE, XGLUE and XTREME that mainly focus on natural language tasks, GEM is a large-scale vision-language benchmark, which consists of GEM-I for image-language tasks and GEM-V for video-language tasks. Comparing with existing multimodal datasets such as MSCOCO and Flicker30K for image-language tasks, YouCook2 and MSR-VTT for video-language tasks, GEM is not only the largest vision-language dataset covering image-language tasks and video-language tasks at the same time, but also labeled in multiple languages. We also provide two baseline models for this benchmark. We will release the dataset, code and baseline models, aiming to advance the development of multilingual multimodal research.

机器翻译,仅供参考

本文参与 腾讯云自媒体同步曝光计划,分享自微信公众号。
原始发表:2021-06-21,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 arXiv每日学术速递 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体同步曝光计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
相关产品与服务
机器翻译
机器翻译(Tencent Machine Translation,TMT)结合了神经机器翻译和统计机器翻译的优点,从大规模双语语料库自动学习翻译知识,实现从源语言文本到目标语言文本的自动翻译,目前可支持十余种语言的互译。
领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档