【24h】

Semantic Error Detection and Correction in Bangla Sentence

机译:孟加拉语句中的语义错误检测与纠正

获取原文

摘要

Detection and correction of errors in Bengali text is essential. In general, Bengali text error can be classified into non-word error and semantic error (also known as context sensitive error). Till date, auto-correction for semantic error in Bengali sentence is challenging since there is no significant research works on this very topic. In this paper, we bring out the concept of Semantic Error detection and correction. We have developed a method that can detect and correct this kind of errors. Semantic error includes typographical error, grammatical errors, homophone errors, homonym error etc. Our goal to this study is to develop an approach to handle multiple semantic errors in a sentence. We have used our own built confused word list by edit distance and apply Naïve Bayes Classifier to detect and correct typographical and homophone error. For a candidate word from a sentence, we pick out a set of words which is a collection of confused words. We use all other neighbor words as features for each word from confusion set. Then we apply naïve theorem to calculate the probability and decide whether a target word is error or not. We have used 28,057 sentences to evaluate our model and we have achieved more than 90% accuracy. All data corpora used to evaluate the model are built by us. We strongly believe that the problem we have solved may shed light on the advancement of Bengali language processing significantly.
机译:检测和纠正孟加拉语文本中的错误至关重要。通常,孟加拉语文本错误可分为非单词错误和语义错误(也称为上下文相关错误)。到目前为止,孟加拉语句子中的语义错误的自动更正具有挑战性,因为在该主题上尚无重大研究工作。在本文中,我们提出了语义错误检测和纠正的概念。我们已经开发出一种可以检测和纠正此类错误的方法。语义错误包括印刷错误,语法错误,同音异义词,同音异义错误等。我们本研究的目标是开发一种处理句子中多个语义错误的方法。我们通过编辑距离使用了自己构建的混淆词表,并应用了朴素贝叶斯分类器来检测和纠正印刷和同音字错误。对于句子中的候选单词,我们选择一组单词,这些单词是混淆单词的集合。我们将所有其他邻居单词用作混淆集中每个单词的特征。然后,我们应用朴素定理来计算概率,并确定目标单词是否错误。我们已经使用了28,057个句子来评估我们的模型,并且已经达到了90%以上的准确性。我们建立了用于评估模型的所有数据集。我们坚信,我们解决的问题可能会极大地说明孟加拉语语言处理的发展。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号