Towards Indian language spell-checker design

机译：对印度语言拼写检查设计

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper deals with the development of spellchecker in Indian Languages with an example in Bangla, the second most popular language in Indian Subcontinent. A brief review of problems and current scenario of Indian language spell-checkers is described. Then the approach on Bangla spell-checker is elaborated. In this approach the technique works in two stages. The first stage takes care of phonetic similarity error. For that the phonetically similar characters are mapped into single units of character code. A new dictionary D{sub}c is constructed with this reduced set of alphabet A phonetically similar but wrongly spelt word can be easily corrected using this dictionary. The second stage takes care of errors other than phonetic similarity. Here wrongly spelt word S of n characters is searched in the dictionary D{sub}c. If S is a nonword, its first k{sub}1≤n characters will match with a valid word in D{sub}c. (if k{sub}1=n then the word in D{sub}c must be longer than n). A reversed word dictionary D{sub}r is also generated where the characters of the word are maintained in a reversed order. If the last k{sub}2 characters of S match with a word in D{sub}r then, for single error, it is located within the intersection region of first k{sub}1+1 and last k{sub}2 +1 characters of S. We observed that this region is very small compared to word length for most cases and the number of suggested correct words can be drastically reduced using this information. We have used our approach in correcting Bangla text, where the problem of inflection is tackled by a simplified version of morphological analyser. Another problem encountered in Indian languages is the existence of large number of compound words formed by Euphony and Assimilation. The problem of compound words is also carefully tackled.

机译：本文涉及印度语言的拼写器的发展，其中孟加拉在印度次大陆的第二个最受欢迎的语言中。描述了对印度语言拼写检查的问题和当前情景的简要述评。然后阐述了Bangla Spell-Checker的方法。在这种方法中，该技术在两个阶段工作。第一阶段负责监听语音相似性错误。因为该语音地类似的字符被映射到字符代码的单个单位。使用该字母一组简化的字母组构造了新的字典D {sub} C，可以使用此字典容易地校正语音相似但错误拼写的单词。第二阶段照顾语音相似性以外的错误。在这里，在字典D {sub} c中搜索n个字符的错误字样。如果s是nonword，则其第一个k {sub}1≤n字符将与d {sub} c中的有效字匹配。（如果k {sub} 1 = n那么d {sub} c中的字必须长于n）。还生成了反转的单词字典D {Sub} R，其中单词的字符以反向顺序维护。如果对于单个错误，对于单个错误，则匹配的最后k {sub} 2个字符与d {sub} r中的单词，它位于第一k {sub} 1 + 1 + 1和最后k {sub} 2的交叉区域内+1的S.我们观察到，与大多数情况下，这个区域与字的字数相比非常小，并且可以使用这些信息大大减少建议正确的单词的数量。我们使用我们在纠正Bangla文本方面的方法，其中通过简化版本的形态分析仪解决了拐点的问题。印度语言遇到的另一个问题是存在大量由谐波和同化形成的复合词。复合词的问题也仔细解决。

著录项

来源
《Language Engineering Conference》|2003年||共8页
会议地点
作者
Bidyut Baran Chaudhuri;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Vafa spell-checker for detecting spelling, grammatical, and real-word errors of Persian language [J] . Faili Heshaam, Ehsan Nava, Montazery Mortaza, Literary & linguistic computing . 2016,第1期

机译：Vafa拼写检查器可检测波斯语的拼写，语法和实词错误
2. Design and Development of a Computer Vision Algorithm and Tool for Currency Recognition in Indian Vernacular Languages for Visually Challenged People [J] . Vishwas Raval Electronic Letters on Computer Vision and Image Analysis: ELCVIA . 2019,第2期

机译：在视觉挑战人员中印度思想语言中的电脑视觉算法和货币识别工具的设计与开发
3. An Approach to Design Virtual Keyboards for Text Composition in Indian Languages [J] . Debasis Samanta, Sayan Sarcar, Soumalya Ghosh International journal of human-computer interaction . 2013,第7a9期

机译：设计用于印度语言文字合成的虚拟键盘的方法
4. Towards Indian language spell-checker design [C] . Chaudhuri, B.B. . 2003

机译：迈向印度语拼写检查器设计
5. Design and Development of a Computer Vision Algorithm and Tool for Currency Recognition in Indian Vernacular Languages for Visually Challenged People [D] . Raval, Vishwas Jayantilal. 2018

机译：印度挑战人员印度白话语言的电脑视觉算法与工具的设计与开发
6. Translation and Adaptation of Five English Language Self-Report Health Measures to South Indian Kannada Language [O] . Spoorthi Thammaiah, Vinaya Manchaiah, Vijayalakshmi Easwar, 2016

机译：五种英语自我报告卫生措施对南印度卡纳达语的翻译和改编
7. Captioning and Indian Sign Language as Accessibility Tools in Universal Design [O] . Mathew Martin, Poothullil John, Sahasrabudhe, Sujit, Chavan, Prashant D., 2013

机译：字幕和印度手语作为通用设计中的辅助功能工具

Towards Indian language spell-checker design

摘要

著录项

相似文献

相关主题

期刊订阅