【24h】

Context-Sensitive Arabic Spell Checker Using Context Words and N-Gram Language Models

机译:使用上下文词和N-Gram语言模型的上下文敏感阿拉伯语拼写检查器

获取原文
获取原文并翻译 | 示例

摘要

This paper addresses real-word spell checking using context words and n-gram language models. A corpus that consists of different Arabic topics is collected. A collection of confusion sets is normally used in addressing real-word errors. Twenty eight confusion sets are chosen in our experiments. These sets were collected from the most common confused words made by non-native Arabic speakers and from OCR misrecognized words. The probabilities of the context words of the confusion sets are estimated using a window-based technique. N-gram language models are used to detect real-word errors and to choose the best correction for the errors once found. An automatic context-sensitive spell checking prototype that detects and corrects real-word errors in Arabic text is implemented. The experimental results showed promising correction accuracy.
机译:本文介绍了使用上下文词和n-gram语言模型进行的实词拼写检查。收集了一个由不同阿拉伯语主题组成的语料库。混乱集的集合通常用于解决实词错误。在我们的实验中选择了28个混淆集。这些集合是从非母语的阿拉伯语使用者最常见的混淆词以及OCR错误识别的词中收集的。使用基于窗口的技术来估计混淆集的上下文词的概率。 N-gram语言模型用于检测实词错误并为发现的错误选择最佳校正。实现了一个自动上下文相关的拼写检查原型,该原型可以检测和纠正阿拉伯文本中的实词错误。实验结果表明有希望的校正精度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号