【24h】

Chinese Spelling Errors Detection Based on CSLM

机译:基于CSLM的中文拼写错误检测

获取原文
获取原文并翻译 | 示例

摘要

Spelling errors are very common in various electronic documents and it leads to serious influence sometimes. To solve this problem, methods based on the n-gram language model are the most commonly used. CSLM (continuous space language model) which represents a word as a vector is different from traditional models. In this paper, we experimented with a specific CSLM, namely, the CBOW (Continuous Bag-of-Words) model, to detect spelling errors. Since spelling errors are usually considered as wrong characters rather than words in Chinese language, we trained character vectors with a large Chinese corpus, and then judged a Chinese character is right or not by its probability of the occurrence in a given context. Experimental results show that the method based on CSLM outperforms the n-gram language model.
机译:拼写错误在各种电子文档中非常常见,有时会造成严重影响。为了解决该问题,最常用的是基于n-gram语言模型的方法。将单词表示为向量的CSLM(连续空间语言模型)与传统模型不同。在本文中,我们尝试了一种特定的CSLM,即CBOW(连续词袋)模型,以检测拼写错误。由于拼写错误通常被认为是错误的字符而不是中文单词,因此我们训练了带有大型中文语料库的字符向量,然后根据给定上下文中出现汉字的概率来判断汉字是否正确。实验结果表明,基于CSLM的方法优于n-gram语言模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号