首页> 外文期刊>ACM transactions on Asian language information processing >'UTTAM': An Efficient Spelling Correction System for Hindi Language Based on Supervised Learning
【24h】

'UTTAM': An Efficient Spelling Correction System for Hindi Language Based on Supervised Learning

机译:'UTTAM':基于监督学习的高效印地语拼写校正系统

获取原文
获取原文并翻译 | 示例
       

摘要

In this article, we propose a system called "UTTAM," for correcting spelling errors in Hindi language text using supervised learning. Unlike other languages, Hindi contains a large set of characters, words with inflections and complex characters, phonetically similar sets of characters, and so on. The complexity increases the possibility of confusion and occasionally leads to entering a wrong character in a word. The existence of spelling errors in text significantly decreases the accuracy of the available resources, like search engine, text editor, and so on. The proposed work is the first approach to correct non-word (Out of Vocabulary) errors as well as real-word errors simultaneously in a sentence of Hindi language. The proposed method investigates the human behavior, i.e., the type and frequency of spelling errors done by humans in Hindi text. Based on the type and frequency of spelling errors, the heterogeneous data is collected in matrices. This data in matrices is used to generate the suitable candidate words for an input word. After generating candidate words, the Viterbi algorithm is applied to perform the word correction. The Viterbi algorithm finds the best sequence of candidate words to correct the input sentence. For Hindi, this work is the first attempt for real-word error correction. For non-word errors, the experiments show that "UTTAM" performs better than the existing systems SpellGuru and Saksham.
机译:在本文中,我们提出了一个名为“ UTTAM”的系统,用于使用监督学习来纠正印地语语言文本中的拼写错误。与其他语言不同,北印度语包含大量字符,带有变形和复杂字符的单词,在语音上相似的字符集,等等。复杂性增加了混淆的可能性,并偶尔导致在单词中输入错误的字符。文本中存在拼写错误会大大降低可用资源(如搜索引擎,文本编辑器等)的准确性。拟议的工作是同时纠正印地语句子中的非单词(单词外)错误和实词错误的第一种方法。所提出的方法研究了人类的行为,即人类在印地语文本中进行拼写错误的类型和频率。根据拼写错误的类型和频率,可以在矩阵中收集异构数据。矩阵中的该数据用于生成输入单词的合适候选单词。在生成候选单词之后,应用维特比算法来执行单词校正。维特比算法找到候选单词的最佳顺序以纠正输入句子。对于印地语来说,这项工作是对实词错误纠正的首次尝试。对于非单词错误,实验表明“ UTTAM ”的性能优于现有系统SpellGuru和Saksham。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号