首页> 外文期刊>International Journal on Computer Science and Engineering >A Novel Approach for English to South Dravidian Language Statistical Machine Translation System
【24h】

A Novel Approach for English to South Dravidian Language Statistical Machine Translation System

机译:英语到南德拉维语统计机器翻译系统的新方法

获取原文
           

摘要

Development of a well fledged bilingual machine translation (MT) system for any two natural languages with limited electronic resources and tools is a challenging and demanding task. This paper presents the development of a statistical machine translation (SMT) system for English to South Dravidian languages like Malayalam and Kannada by incorporating syntactic and morphological information. SMT is a data oriented statistical framework for translating text from one natural language to another based on the knowledge extracted from bilingual corpus. Even though there are efforts towards building such an English to South Dravidian translation system ,unfortunately we do not have an efficient translation system till now. The first and most important step in SMT is creating a well aligned parallel corpus for training the system. Experimental research shows that the existing methodology for bilingual parallel corpus creation is not efficient for English to South Dravidian language in the SMT system. In order to increase the performance of the translation system, we have introduced a new approach in creating parallel corpus. The main ideas which we have implemented and proven very effective for English to south Dravidian languages SMT system are: (i) reordering the English source sentence according to Dravidian syntax, (ii) using the root suffix separation on both English and Dravidian words and iii) use of morphological information which substantially reduce the corpus size required for training the system. Since the unavailability of full fledged parsing and morphological tools for Malayalam and Kannada languages, sentence synthesis was done both manually and existing morph analyzer created by Amrita university. From the experiment we found that the performance of our systems are significantly well and achieves a very competitive accuracy for small sized bilingual corpora. The proposed ideas can be directly used for other south Dravidian languages like Tamil and Telugu with some minor changes.
机译:为电子语言和工具有限的任何两种自然语言开发完善的双语机器翻译(MT)系统是一项艰巨而艰巨的任务。本文介绍了通过结合句法和词法信息将英语翻译为马拉雅拉姆语和卡纳达语等南德拉维语的统计机器翻译(SMT)系统的过程。 SMT是一种面向数据的统计框架,用于基于从双语语料库中提取的知识将文本从一种自然语言转换为另一种自然语言。尽管我们正在努力建立这种从英语到南德拉维的翻译系统,但不幸的是,到目前为止,我们还没有一个有效的翻译系统。 SMT的第一步也是最重要的一步是创建一个对齐良好的并行语料库以训练系统。实验研究表明,现有的双语并行语料库创建方法对于SMT系统中英语到南德拉维语的效率不高。为了提高翻译系统的性能,我们引入了一种创建平行语料库的新方法。我们已经实现并证明对英语到南德拉维语的SMT系统非常有效的主要思想是:(i)根据Dravidian语法对英语源句进行重新排序,(ii)对英语和Dravidian单词使用根后缀分隔,以及iii )使用形态学信息,从而大大减少了训练系统所需的语料库大小。由于无法提供适用于马拉雅拉姆语和卡纳达语的完整解析和词法工具,因此句子合成是手动完成的,也使用了Amrita大学创建的现有词法分析器。从实验中我们发现,我们的系统性能非常好,并且对于小型双语语料库具有非常高的竞争力。所提出的构想可以稍作改动,直接用于其他南德拉维语,例如泰米尔语和泰卢固语。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号