首页> 外文会议>CCF International Conference on Natural Language Processing and Chinese Computing >Automatic Chinese Spelling Checking and Correction Based on Character-Based Pre-trained Contextual Representations
【24h】

Automatic Chinese Spelling Checking and Correction Based on Character-Based Pre-trained Contextual Representations

机译:基于字符的预训练上下文表示的中文自动拼写检查和纠正

获取原文

摘要

Automatic Chinese spelling checking and correction (CSC) is currently a challenging task especially when the sentence is complex in semantics and expressions. Meanwhile, a CSC model normally requires a huge amount of training corpus which is usually unavailable. To capture the semantic information of sentences, this paper proposes an approach (named as DPL-Corr) based on character-based pre-trained contextual representations, which helps to significantly improve the performance of CSC. In DPL-Corr, the module of spelling checking is a sequence-labeling model enhanced by deep contextual semantics analysis, and the module of spelling correction is a masked language model integrated with multilayer filtering to obtain the final corrections. Based on experiments on SIGHAN 2015 dataset, DPL-Corr achieves a significantly better performance of CSC than conventional models.
机译:自动中文拼写检查和纠正(CSC)当前是一项具有挑战性的任务,尤其是当句子的语义和表达方式复杂时。同时,CSC模型通常需要大量的训练语料库,而这通常是不可用的。为了捕获句子的语义信息,本文提出了一种基于字符的预训练上下文表示的方法(称为DPL-Corr),有助于显着提高CSC的性能。在DPL-Corr中,拼写检查模块是通过深度上下文语义分析增强的序列标签模型,而拼写校正模块是与多层过滤集成以获得最终更正的屏蔽语言模型。基于SIGHAN 2015数据集的实验,DPL-Corr的CSC性能要比传统模型好得多。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号