首页> 外文会议>Indian International Conference on Artificial Intelligence >Language Independent Named Entity Transliteration
【24h】

Language Independent Named Entity Transliteration

机译:语言独立命名实体音译

获取原文

摘要

This paper reports about a modified joint source -channel model that has been used along with a number of alternatives to generate English transliterations of the Bengali, Hindi and Telugu Named entities (NE). The Bengali, Hindi and Telugu NEs are divided into Transliteration Units (TUs) that have the pattern C{sup}+M, where C represents a vowel or a consonant or conjunct and M represents the vowel modifier or matra. An English word is divided into TUs that have the pattern C{sup}*V{sup}*, where C represents a consonant and V represents a vowel. The system learns mappings automatically from the bilingual training sets of NEs, particularly person and location names. The output of this mapping process is a decision-list classifier with collocated TUs in the source language and their equivalent TUs in collocation in the target language along with the probability of each decision obtained from the training set. Evaluation of the proposed models in terms of Word Agreement Ratio (WAR) and Transliteration Unit Agreement Ratio (TUAR) demonstrated that the modified joint source-channel model performs best for Bengali, Hindi and Telugu to English transliterations. All the transliteration models have been tested for English to Bengali transliteration and it has been observed that the modified joint source-channel performs best in terms of evaluation parameters, WAR and TUAR. The Bengali to English (B2E) and English to Bengali (E2B) transliteration systems have also been tested with the help of linguistic knowledge in the form of conjuncts and diphthongs in Bengali and their corresponding representations in English. It has been observed that with the inclusion of linguistic knowledge the modified joint source-channel model performs best.
机译:本文报告了修改的联合源-Channel模型,该模型已经与许多替代品一起使用,以生成孟加拉,印地语和泰卢先具命名实体(NE)的英语音译。孟加拉,印地语和静音NE被分成具有图案C {SUP} + M的音译单元(TU),其中C代表元音或辅音或混合,M代表元音改性剂或MATRA。英语单词分为具有模式c {sup} * v {sup} *的tus,其中c表示辅音,v表示元音。该系统从网格训练集NES,特别是人和位置名称中自动学习映射。该映射过程的输出是一个决策列表分类器,其中源语言中的并置TU和目标语言的搭配中的等效TU以及从训练集中获得的每个决定的概率。在单词协议率(战争)和音译单元协议率(TUAR)方面评估所提出的模型(TUAR)的说明,修改的联合源通道模型对于孟加拉,印地语和泰卢固定来说,最适合英语音译。所有的音译模型都已测试英语到孟加拉音译,并且已经观察到改进的联合源 - 在评估参数,战争和托的方面表现最佳。孟加拉语到英语(B2E)和英语到孟加拉语(E2B)音译系统也得到了在孟加拉的结合和二维形式的语言知识和英语的相应陈述的帮助下进行了测试。已经观察到,在包含语言知识的情况下,修改的联合源通道模型表现最佳。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号