edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

机译：edatlas：一种有效的消歧算法，用于用abugida脚本发短信

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Abugida refers to a phonogram writing system where each syllable is represented using a single consonant or typographic ligature, along with a default vowel or optional diacritic(s) to denote other vowels. However, texting in these languages has some unique challenges in spite of the advent of devices with soft keyboard supporting custom key layouts. The number of characters in these languages is large enough to require characters to be spread over multiple views in the layout. Having to switch between views many times to type a single word hinders the natural thought process. This prevents popular usage of native keyboard layouts. On the other hand, supporting romanized scripts (native words transcribed using Latin characters) with language model based suggestions is also set back by the lack of uniform romanization rules. To this end, we propose a disambiguation algorithm and showcase its usefulness in two novel mutually non-exclusive input methods for languages natively using the abugida writing system: (a) disambiguation of ambiguous input for abugida scripts, and (b) disambiguation of word variants in romanized scripts. We benchmark these approaches using public datasets, and show an improvement in typing speed by 19.49%, 25.13%, and 14.89%, in Hindi, Bengali, and Thai, respectively, using Ambiguous Input, owing to the human ease of locating keys combined with the efficiency of our inference method. Our Word Variant Disambiguation (WDA) maps valid variants of romanized words, previously treated as Out-of-Vocab, to a vocabulary of 100k words with high accuracy, leading to an increase in Error Correction F1 score by 10.03% and Next Word Prediction (NWP) by 62.50% on average.

机译：Abugida是指录音图写入系统，其中每个音节使用单个辅音或印刷结扎表示，以及默认元音或可选的型号来表示其他元音。然而，尽管具有支持自定义密钥布局的软键盘的设备出现，这些语言中的发短信具有一些独特的挑战。这些语言中的字符数足够大，以便在布局中需要在多个视图上传播字符。必须多次在视图之间切换到键入单个单词阻碍自然思想过程。这可以防止本机键盘布局的流行使用。另一方面，支持基于语言模型的语言模型的罗马化脚本（使用拉丁字符转换的本机单词）也通过缺乏统一的罗马化规则来设置。为此，我们提出了一个消歧算法，并使用Abugida写入系统本地的两种新颖的相互非专用输入方法展示其用语言的两种互斥输入方法：（a）对Abugida脚本的模糊输入的歧义，（b）词变化的歧义在罗马化的脚本中。我们使用公共数据集基准这些方法，并分别使用模糊投入的人力化输入分别在印度，孟加拉和泰国的输入速度提高了19.49％，25.13％和14.89％的提高，因为定位键结合了我们推断方法的效率。我们的单词变异歧义（WDA）将罗马化词的有效变体映射到以外的rocab，以高精度为100k字的词汇，导致误差校正F1的增加10.03％和下一个字预测（ NWP）平均达到62.50％。

著录项

来源
《IEEE International Conference on Semantic Computing》|2021年|325-332|共8页
会议地点
作者
Sourav Ghosh; Sourabh Vasant Gothe; Chandramouli Sanchi; Barath Raj Kandur Raja;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Vocabulary; Layout; Sociology; Keyboards; Switches; Writing; Error correction;

机译：词汇;布局;社会学;键盘;开关;写作;纠错;

相似文献

外文文献
中文文献
专利

1. Named Entity Disambiguation over Texts Written in the Portuguese or Spanish Languages [J] . Santos Joao Tiago Luis, Anastacio Ivo Miguel, Martins Bruno Emanuel Latin America Transactions, IEEE (Revista IEEE America Latina) . 2015,第3期

机译：葡萄牙语或西班牙语文字的命名实体歧义消除
2. An Artificial Neural Network Approach for Sentence Boundary Disambiguation in Urdu Language Text [J] . Raj Shazia, Rehman Zobia, Rauf Sonia, The international arab journal of information technology . 2015,第4期

机译：乌尔都语文本句子边界消歧的人工神经网络方法
3. New Techniques for Disambiguation in Natural Language and Their Application to Biological Text [J] . Ginter Filip, Boberg Jorma, J?¤rvinen Jouni, Journal of machine learning research . 2004,第Jun期

机译：自然语言歧义消除新技术及其在生物文本中的应用
4. Layout and Language: An Efficient Algorithm for Detecting Text Blocks Based on Spatial and Linguistic Evidence [C] . Matthew Hurst Annual document recognition and retrieval conference . 2001

机译：布局和语言：基于空间和语言证据的文本块检测文本块的高效算法
5. Benchmarking scripting languages, Microsoft .NET, and databases with a focus on text mining performance. [D] . Chadwick, Stephen C. 2007

机译：对脚本语言，Microsoft .NET和数据库进行基准测试，重点是文本挖掘性能。
6. Natural language processing algorithms for mapping clinical text fragments onto ontology concepts: a systematic review and recommendations for future studies [O] . Martijn G. Kersloot, Florentien J. P. van Putten, Ameen Abu-Hanna, 2020

机译：用于将临床文本碎片映射到本体概念的自然语言处理算法：未来研究的系统审查和建议
7. Efficient Text Extraction Algorithm Using Color Clustering for Language Translation in Mobile Phone [O] . Yolanda Blanco-Fernández, Sai Kiran Veeramachaneni, Sun Yi, 2012

机译：基于颜色聚类的高效文本提取算法在手机语言翻译中的应用

edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅