拼写错误纠正是一项重要而又具有挑战性的任务,因为令人满意的解决方案本质上需要人类水平的语言理解能力。在不失一般性的前提下,我们在本文中考虑了中文拼写错误纠正(CSC)。用于该任务的最新方法是基于语言表示模型BERT从句子的每个位置的待纠正候选对象列表中选择一个字符。但是,由于BERT并没有足够的能力来检测每个位置是否存在错误,因此该方法的准确性可能不是最佳的,这显然是由于使用掩码语言建模进行预训练的方式所致。在这项工作中,我们提出了一种新颖的神经体系结构来解决上述问题,它由一个基于BERT的错误检测网络和一个纠错网络组成,前者通过所谓的软屏蔽技术与后者连接。我们使用“ Soft-Masked BERT”的方法是通用的,并且可以在其他语言检测校正问题中使用。在两个数据集上的实验结果表明,我们提出的方法的性能明显好于基线,包括仅基于BERT的基线。
原文标题:Spelling Error Correction with Soft-Masked BERT
原文:Spelling error correction is an important yet challenging task because a satisfactory solution of it essentially needs human-level language understanding ability. Without loss of generality we consider Chinese spelling error correction (CSC) in this paper. A state-of-the-art method for the task selects a character from a list of candidates for correction (including non-correction) at each position of the sentence on the basis of BERT, the language representation model. The accuracy of the method can be sub-optimal, however, because BERT does not have sufficient capability to detect whether there is an error at each position, apparently due to the way of pre-training it using mask language modeling. In this work, we propose a novel neural architecture to address the aforementioned issue, which consists of a network for error detection and a network for error correction based on BERT, with the former being connected to the latter with what we call soft-masking technique. Our method of using `Soft-Masked BERT' is general, and it may be employed in other language detection-correction problems. Experimental results on two datasets demonstrate that the performance of our proposed method is significantly better than the baselines including the one solely based on BERT.
原文作者:Shaohua Zhang, Haoran Huang, Jicong Liu, Hang Li
原文地址:https://arxiv.org/abs/2005.07421
原创声明,本文系作者授权云+社区发表,未经许可,不得转载。
如有侵权,请联系 yunjia_community@tencent.com 删除。
我来说两句