首页> 外文会议>9th International conference on language resources and evaluation >Sata-Anuvadak : Tackling Multiway Translation of Indian Languages
【24h】

Sata-Anuvadak : Tackling Multiway Translation of Indian Languages

机译:Sata-Anuvadak:处理印度语言的多路翻译

获取原文

摘要

We present a compendium of 110 Statistical Machine Translation systems built from parallel corpora of 11 Indian languages belonging to the Indo-Aryan and Dravidian families. We analyze the relationship between translation accuracy and the language families involved. We feel that insights obtained from this analysis will provide guidelines for creating machine translation systems for specific Indian language pairs. For our studies, we built phrase based systems and some extensions. Across multiple languages, we show improvements on the baseline phrase based systems using these extensions: (1) source side reordering for English-Indian language translation, and (2) transliteration of untranslated words for Indian language-Indian language translation. These enhancements harness shared characteristics of Indian languages. To stimulate similar innovation widely in the NLP community, we have made the trained models for these language pairs publicly available.
机译:我们提出了一个由110种统计机器翻译系统构成的纲要,这些系统是由11种印度语言的平行语料库构建而成的,这些语言属于Indo-Aryan和Dravidian家族。我们分析了翻译准确性与所涉及的语言家族之间的关系。我们认为,从该分析中获得的见识将为创建针对特定印度语言对的机器翻译系统提供指导。对于我们的研究,我们构建了基于短语的系统和一些扩展。在多种语言之间,我们显示了使用这些扩展在基于基线短语的系统上的改进:(1)英语-印度语言翻译的源端重新排序,以及(2)印度语言-印度语言翻译的未翻译词的音译。这些增强功能利用了印度语言的共同特征。为了在NLP社区中广泛激发类似的创新,我们已经公开提供了这些语言对的训练模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号