【24h】

Semantic Error Detection and Correction in Bangla Sentence

机译:Bangla句子中的语义错误检测和校正

获取原文
获取外文期刊封面目录资料

摘要

Detection and correction of errors in Bengali text is essential. In general, Bengali text error can be classified into non-word error and semantic error (also known as context sensitive error). Till date, auto-correction for semantic error in Bengali sentence is challenging since there is no significant research works on this very topic. In this paper, we bring out the concept of Semantic Error detection and correction. We have developed a method that can detect and correct this kind of errors. Semantic error includes typographical error, grammatical errors, homophone errors, homonym error etc. Our goal to this study is to develop an approach to handle multiple semantic errors in a sentence. We have used our own built confused word list by edit distance and apply Na?ve Bayes Classifier to detect and correct typographical and homophone error. For a candidate word from a sentence, we pick out a set of words which is a collection of confused words. We use all other neighbor words as features for each word from confusion set. Then we apply na?ve theorem to calculate the probability and decide whether a target word is error or not. We have used 28,057 sentences to evaluate our model and we have achieved more than 90% accuracy. All data corpora used to evaluate the model are built by us. We strongly believe that the problem we have solved may shed light on the advancement of Bengali language processing significantly.
机译:孟加拉文本中错误的检测和校正至关重要。通常,孟加拉文本错误可以分类为非字错误和语义错误(也称为上下文敏感错误)。截至日期,孟加拉句子中的语义错误自动校正是挑战,因为这个非常主题没有显着的研究。在本文中,我们带出了语义错误检测和校正的概念。我们开发了一种方法可以检测和纠正这种错误。语义错误包括印刷错误,语法错误,同音静音错误,同音异调错误等。我们对本研究的目标是开发一种方法来处理句子中的多个语义错误。我们通过编辑距离使用了自己的内置困惑的单词列表,并应用Na?ve Bayes分类器来检测和纠正印刷和谐误差。对于来自句子的候选词,我们挑选出一组是一个困惑的词汇的词。我们将所有其他邻单词作为来自混淆集的每个单词的功能。然后我们应用na?ve定理来计算概率并决定目标字是否错误。我们使用了28,057句话来评估我们的模型,我们的准确性高度超过90%。所有用于评估模型的数据语料库都是由我们构建的。我们强烈地认为,我们解决的问题可能会在孟加拉语加工的进步明显阐明。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号