命名实体识别(NER)是一项经过广泛研究的任务,用于提取文本中的命名实体并对其进行分类。NER不仅在下游语言处理应用程序(例如关系提取和问题解答)中至关重要,而且在大规模大数据操作(例如在线数字媒体内容的实时分析)中也至关重要。最近对土耳其语的研究工作表明,土耳其语是一种形态学丰富的自然语言,很少有人研究,它通过将任务表述为序列标签问题,证明了神经结构在格式正确的文本上的有效性,并产生了最新的结果。在这项工作中,我们以经验方式研究了在同一环境中为土耳其语NER标签建议的最新神经体系结构(双向长短期记忆和基于变压器的网络)的使用。我们的结果表明,可以对远程上下文进行建模的基于变压器的网络克服了BiLSTM网络的局限性,在BiLSTM网络中,字符,子词和词级别使用了不同的输入特征。我们还提出了一种基于变压器的网络,该网络具有条件随机场(CRF)层,可在公共数据集上得出最新结果(95.95%的f度量)。我们的研究有助于量化迁移学习对处理形态丰富的语言的影响的文献。95%的f度量)。我们的研究有助于量化迁移学习对处理形态丰富的语言的影响的文献。95%的f度量)。我们的研究有助于量化迁移学习对处理形态丰富的语言的影响的文献。
原文标题:An Evaluation of Recent Neural Sequence Tagging Models in Turkish Named Entity Recognition
原文:Named entity recognition (NER) is an extensively studied task that extracts and classifies named entities in a text. NER is crucial not only in downstream language processing applications such as relation extraction and question answering but also in large scale big data operations such as real-time analysis of online digital media content. Recent research efforts on Turkish, a less studied language with morphologically rich nature, have demonstrated the effectiveness of neural architectures on well-formed texts and yielded state-of-the art results by formulating the task as a sequence tagging problem. In this work, we empirically investigate the use of recent neural architectures (Bidirectional long short-term memory and Transformer-based networks) proposed for Turkish NER tagging in the same setting. Our results demonstrate that transformer-based networks which can model long-range context overcome the limitations of BiLSTM networks where different input features at the character, subword, and word levels are utilized. We also propose a transformer-based network with a conditional random field (CRF) layer that leads to the state-of-the-art result (95.95\% f-measure) on a common dataset. Our study contributes to the literature that quantifies the impact of transfer learning on processing morphologically rich languages.
原文作者:Gizem Aras, Didem Makaroglu, Seniz Demir, Altan Caki
原文地址:https://arxiv.org/abs/2005.07692
原创声明,本文系作者授权云+社区发表,未经许可,不得转载。
如有侵权,请联系 yunjia_community@tencent.com 删除。
我来说两句