首页> 外文会议>CIPS-SIGHAN joint conference on Chinese language processing >Chinese Spell Checking Based on Noisy Channel Model
【24h】

Chinese Spell Checking Based on Noisy Channel Model

机译:基于嘈杂频道模型的汉语拼写检查

获取原文

摘要

Chinese spell checking is an important component of many NLP applications, including word processors, search engines, and automatic essay rating. Compared to English, Chinese has no word boundaries and there are various Chinese input methods that cause different kinds of typos, so it is more difficult to develop spell checkers for Chinese. In this paper, we introduce a novel method for correcting Chinese typographical errors based on sound or shape similarity. In our approach, similar characters are automatically generated using Web corpora, and potential typos in a given sentence are then corrected using a channel model and a character-based language model in the noisy channel model. In the training phase, we estimate the channel probabilities for each character based on ngrams in Web corpus. At run-time, the system generates correction candidates for each character in the given sentence and selects the appropriate correction using the channel model and the language model.
机译:中文拼写检查是许多NLP应用程序的重要组成部分,包括文字处理器,搜索引擎和自动论文评级。与英语相比,中国人没有单词界限,有各种汉语输入方法导致不同类型的错字,因此为中国人开发拼写检查是更困难的。在本文中,我们介绍了一种基于声音或形状相似性校正中文印刷误差的新方法。在我们的方法中,使用Web语料库自动生成类似的字符,然后使用噪声模型和嘈杂的频道模型中的基于字符的语言模型来纠正给定句子中的潜在键盘。在培训阶段,我们估计基于Web语料库中的Ngrams的每个字符的信道概率。在运行时,系统为给定句子中的每个字符生成校正候选,并使用通道模型和语言模型选择适当的校正。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号