首页> 外文会议>IEEE International Conference on Semantic Computing >edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts
【24h】

edATLAS: An Efficient Disambiguation Algorithm for Texting in Languages with Abugida Scripts

机译:edatlas:一种有效的消歧算法,用于用abugida脚本发短信

获取原文
获取外文期刊封面目录资料

摘要

Abugida refers to a phonogram writing system where each syllable is represented using a single consonant or typographic ligature, along with a default vowel or optional diacritic(s) to denote other vowels. However, texting in these languages has some unique challenges in spite of the advent of devices with soft keyboard supporting custom key layouts. The number of characters in these languages is large enough to require characters to be spread over multiple views in the layout. Having to switch between views many times to type a single word hinders the natural thought process. This prevents popular usage of native keyboard layouts. On the other hand, supporting romanized scripts (native words transcribed using Latin characters) with language model based suggestions is also set back by the lack of uniform romanization rules. To this end, we propose a disambiguation algorithm and showcase its usefulness in two novel mutually non-exclusive input methods for languages natively using the abugida writing system: (a) disambiguation of ambiguous input for abugida scripts, and (b) disambiguation of word variants in romanized scripts. We benchmark these approaches using public datasets, and show an improvement in typing speed by 19.49%, 25.13%, and 14.89%, in Hindi, Bengali, and Thai, respectively, using Ambiguous Input, owing to the human ease of locating keys combined with the efficiency of our inference method. Our Word Variant Disambiguation (WDA) maps valid variants of romanized words, previously treated as Out-of-Vocab, to a vocabulary of 100k words with high accuracy, leading to an increase in Error Correction F1 score by 10.03% and Next Word Prediction (NWP) by 62.50% on average.
机译:Abugida是指录音图写入系统,其中每个音节使用单个辅音或印刷结扎表示,以及默认元音或可选的型号来表示其他元音。然而,尽管具有支持自定义密钥布局的软键盘的设备出现,这些语言中的发短信具有一些独特的挑战。这些语言中的字符数足够大,以便在布局中需要在多个视图上传播字符。必须多次在视图之间切换到键入单个单词阻碍自然思想过程。这可以防止本机键盘布局的流行使用。另一方面,支持基于语言模型的语言模型的罗马化脚本(使用拉丁字符转换的本机单词)也通过缺乏统一的罗马化规则来设置。为此,我们提出了一个消歧算法,并使用Abugida写入系统本地的两种新颖的相互非专用输入方法展示其用语言的两种互斥输入方法:(a)对Abugida脚本的模糊输入的歧义,(b)词变化的歧义在罗马化的脚本中。我们使用公共数据集基准这些方法,并分别使用模糊投入的人力化输入分别在印度,孟加拉和泰国的输入速度提高了19.49%,25.13%和14.89%的提高,因为定位键结合了我们推断方法的效率。我们的单词变异歧义(WDA)将罗马化词的有效变体映射到以外的rocab,以高精度为100k字的词汇,导致误差校正F1的增加10.03%和下一个字预测( NWP)平均达到62.50%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号