'UTTAM': An Efficient Spelling Correction System for Hindi Language Based on Supervised Learning

Jain Amita; Jain Minni; Jain Goonjan; Tayal Devendra K.

首页> 外文期刊>ACM transactions on Asian language information processing >'UTTAM': An Efficient Spelling Correction System for Hindi Language Based on Supervised Learning

【24h】

'UTTAM': An Efficient Spelling Correction System for Hindi Language Based on Supervised Learning

机译：'UTTAM'：基于监督学习的高效印地语拼写校正系统

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

In this article, we propose a system called "UTTAM," for correcting spelling errors in Hindi language text using supervised learning. Unlike other languages, Hindi contains a large set of characters, words with inflections and complex characters, phonetically similar sets of characters, and so on. The complexity increases the possibility of confusion and occasionally leads to entering a wrong character in a word. The existence of spelling errors in text significantly decreases the accuracy of the available resources, like search engine, text editor, and so on. The proposed work is the first approach to correct non-word (Out of Vocabulary) errors as well as real-word errors simultaneously in a sentence of Hindi language. The proposed method investigates the human behavior, i.e., the type and frequency of spelling errors done by humans in Hindi text. Based on the type and frequency of spelling errors, the heterogeneous data is collected in matrices. This data in matrices is used to generate the suitable candidate words for an input word. After generating candidate words, the Viterbi algorithm is applied to perform the word correction. The Viterbi algorithm finds the best sequence of candidate words to correct the input sentence. For Hindi, this work is the first attempt for real-word error correction. For non-word errors, the experiments show that "UTTAM" performs better than the existing systems SpellGuru and Saksham.

机译：在本文中，我们提出了一个名为“ UTTAM”的系统，用于使用监督学习来纠正印地语语言文本中的拼写错误。与其他语言不同，北印度语包含大量字符，带有变形和复杂字符的单词，在语音上相似的字符集，等等。复杂性增加了混淆的可能性，并偶尔导致在单词中输入错误的字符。文本中存在拼写错误会大大降低可用资源（如搜索引擎，文本编辑器等）的准确性。拟议的工作是同时纠正印地语句子中的非单词（单词外）错误和实词错误的第一种方法。所提出的方法研究了人类的行为，即人类在印地语文本中进行拼写错误的类型和频率。根据拼写错误的类型和频率，可以在矩阵中收集异构数据。矩阵中的该数据用于生成输入单词的合适候选单词。在生成候选单词之后，应用维特比算法来执行单词校正。维特比算法找到候选单词的最佳顺序以纠正输入句子。对于印地语来说，这项工作是对实词错误纠正的首次尝试。对于非单词错误，实验表明“ UTTAM ”的性能优于现有系统SpellGuru和Saksham。

著录项

来源
《ACM transactions on Asian language information processing》 |2019年第1期|8.1-8.26|共26页
作者
Jain Amita; Jain Minni; Jain Goonjan; Tayal Devendra K.;
展开▼
作者单位

Ambedkar Inst Adv Commun Technol & Res, Comp Sci, Delhi, India|Ambedkar Inst Adv Commun Technol & Res, Dept Comp Sci & Engn, Delhi, India;

Delhi Technol Univ, Comp Sci & Engn, Delhi, India|Delhi Technol Univ, Dept Comp Sci & Engn, Delhi, India;

Delhi Technol Univ, Dept Appl Math, Delhi, India;

Indira Gandhi Delhi Tech Univ Women, Comp sci & Engn, Delhi, India|Indira Gandhi Delhi Tech Univ Women, Dept Comp Sci & Engn, Delhi, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
Natural language processing; spelling correction; Hindi language; viterbi algorithm;

机译：自然语言处理拼写校正印地语维特比算法;

相似文献

外文文献
中文文献
专利

1. Efficient Model for Numerical Text-To-Speech Synthesis System in Marathi, Hindi and English Languages [J] . G. D. Ramteke, R. J. Ramteke International Journal of Image, Graphics and Signal Processing . 2017,第3期

机译：马拉地语，北印度语和英语语言的数字语音合成系统的有效模型
2. Investigation on Adaptive Context-Aware M-Learning System for Teaching and Learning Basic Hindi Language [J] . Vasanthi Subramanian, R. Rajkumar Indian Journal of Science and Technology . 2016,第3期

机译：用于基础印地语语言教学的自适应上下文感知M学习系统的研究
3. Deep Learning-Based Language Identification in English-Hindi-Bengali Code-Mixed Social Media Corpora [J] . AnupamJamatia, AmitavaDas, Bj?rnGamb?ck Journal of Intelligent Systems . 2019,第3期

机译：英语 - 孟加拉码混合社交媒体集团中深入学习的语言识别
4. Rule-based System for Automatic Grammar Correction Using Syntactic N-grams for English Language Learning (L2) [C] . Grigori Sidorov, Anubhav Gupta, Martin Tozer, Conference on computational natural language learning . 2013

机译：基于规则的语法N-gram用于英语学习的自动语法纠正系统（L2）
5. Communicative foreign language teaching and Computer-Assisted Language Learning: An augmented transition network for Hindi. [D] . Waldspurger, Theresa A. 1989

机译：交际外语教学和计算机辅助语言学习：北印度语的增强过渡网络。
6. Pinyin Spelling Promotes Reading Abilities of Adolescents Learning Chinese as a Foreign Language: Evidence From Mediation Models [O] . Huimin Xiao, Caihua Xu, Hetty Rusamy 2020

机译：拼音拼写促进青少年学习中文作为外语的阅读能力：来自中介模型的证据
7. Automatic Spelling Correction for Resource-Scarce Languages using Deep Learning [O] . Pravallika Etoori, Manoj Chinnakotla, Radhika Mamidi 2018

机译：利用深度学习的资源稀缺语言自动拼写校正

'UTTAM': An Efficient Spelling Correction System for Hindi Language Based on Supervised Learning

摘要

著录项

相似文献

相关主题

期刊订阅