首页> 外文期刊>International journal of knowledge-based and intelligent engineering systems >Resource creation and development of an English-Bangla back transliteration system
【24h】

Resource creation and development of an English-Bangla back transliteration system

机译:资源创建和英语-孟加拉语反向音译系统的开发

获取原文
获取原文并翻译 | 示例
           

摘要

In this paper we present the development of an English-Bangla transliteration parallel corpus and used it to develop and evaluate some of the popular computational models to transliterate Bangla texts written in Romanized English, back to its original script. Accordingly, we have developed different techniques to generate an English-Bangla parallel transliterated lexicon of around 100,000 words. The proposed lexicon of English-Bangla transliterated word pairs along with some of the language specific orthographic as well as phonetic information rules are used to develop two different computational models namely, the joint source channel model and the phrase based SMT model, to automatically identify, extract and learn the transliteration unit (TU) pairs from both the source and target language words. Both the models are used to predict the top 5 best possible outcome of the given input text. Both the models have been evaluated with a set of 20000 Romanized transliterated Bangla test words. Our initial evaluation results clearly shows that performance of the SMT model slightly surpasses the performance joint source channel model.
机译:在本文中,我们介绍了英语-孟加拉语音译平行语料库的开发,并将其用于开发和评估一些流行的计算模型,以音译以罗马化英语编写的孟加拉语文本,并将其还原为原始脚本。因此,我们开发了不同的技术来生成大约100,000个单词的英语-孟加拉语平行音译词典。拟议的英语-孟加拉语音译词对词典以及一些特定于语言的拼写法和语音信息规则用于开发两种不同的计算模型,即联合源渠道模型和基于短语的SMT模型,以自动识别,从源语言和目标语言单词中提取并学习音译单元(TU)对。这两个模型都用于预测给定输入文本的前5个最佳可能结果。这两个模型都使用一组20000个罗马化音译孟加拉语测试词进行了评估。我们的初步评估结果清楚地表明,SMT模型的性能略高于性能联合源通道模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号