首页> 外文会议>CIPS-SIGHAN joint conference on Chinese language processing >Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape
【24h】

Chinese Spelling Error Detection and Correction Based on Language Model, Pronunciation, and Shape

机译:基于语言模型,发音和形状的中文拼写错误检测与纠正

获取原文

摘要

Spelling check is an important preprocessing task when dealing with user generated texts such as tweets and product comments. Compared with some western languages such as English, Chinese spelling check is more complex because there is no word delimiter in Chinese written texts and misspelled characters can only be determined in word level. Our system works as follows. First, we use character-level n-gram language models to detect potential misspelled characters with low probabilities below some predefined threshold. Second, for each potential incorrect character, we generate a candidate set based on pronunciation and shape similarities. Third, we filter some candidate corrections if the candidate cannot form a legal word with its neighbors according to a word dictionary. Finally, we find the best candidate with highest language model probability. If the probability is higher than a predefined threshold, then we replace the original character; or we consider the original character as correct and take no action. Our preliminary experiments shows that our simple method can achieve relatively high precision but low recall.
机译:在处理用户生成的文本(例如推文和产品评论)时,拼写检查是一项重要的预处理任务。与英文等西方语言相比,中文拼写检查更为复杂,因为中文书面文本中没有单词定界符,而拼写错误的字符只能在单词级别上确定。我们的系统工作如下。首先,我们使用字符级n-gram语言模型来检测具有低于某些预定义阈值的低概率的潜在拼写错误的字符。其次,对于每个潜在的不正确字符,我们根据发音和形状相似性生成候选集。第三,如果候选人无法根据单词词典与邻居形成合法单词,我们会过滤一些候选人更正。最后,我们找到具有最高语言模型概率的最佳人选。如果概率高于预定义的阈值,则我们替换原始字符;否则我们认为原始字符正确无误。我们的初步实验表明,我们的简单方法可以实现较高的精度,但召回率较低。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号