...
首页> 外文期刊>International journal on digital libraries >Transliterating Latin to Amharic scripts using user-defined rules and character mappings
【24h】

Transliterating Latin to Amharic scripts using user-defined rules and character mappings

机译:Transliterating Latin to Amharic scripts using user-defined rules and character mappings

获取原文
获取原文并翻译 | 示例
           

摘要

As social media platforms become increasingly accessible, individuals' usage of new forms of textual communication (posts, comments, chats, etc.) on social media using local language scripts such as Amharic has increased tremendously. However, many users prefer to post comments in Latin scripts instead of local ones due to the availability of more convenient forms of character input using Latin keyboards. In existing Latin to Amharic transliteration systems, missing consideration of double consonants and double vowels has caused transliteration errors. Further, as there are multiple ways of character mapping conventions in existing systems, social media texts are susceptible to a wide variety of user adoptions during script production. The current systems have failed to address these gaps and adoptions. In this work, we present the RBLatAm (Rule-Based Latin to Amharic) transliteration system, a generic rule-based system that converts Amharic words which have been written using Latin script back into their native Amharic script. The system is based on mapping rules engineered from three existing transliteration systems (Microsoft, Google, SERA) and additional rules for double consonants, and conventions adopted on social media by speakers of Amharic. When tested on transliterated Amharic words of non-named entities, and named entities of persons, the system achieves an accuracy of 75.8% and 84.6%, respectively. The system also correctly transliterates words reported as errors in previous studies. This system drastically improves the basis for performing research on text mining for Amharic language texts by being able to process such texts even if they have originally been produced in Latin scripts.

著录项

获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号