首页> 外文期刊>ACM transactions on Asian language information processing >Role of Morphology Injection in SMT: A Case Study from Indian Language Perspective
【24h】

Role of Morphology Injection in SMT: A Case Study from Indian Language Perspective

机译:形态注射在SMT中的作用:以印度语言为例的案例研究

获取原文
获取原文并翻译 | 示例
       

摘要

Phrase-based Statistical Machine Translation (PBSMT) is commonly used for automatic translation. However, PBSMT runs into difficulty when either or both of the source and target languages are morphologically rich. Factored models are found to be useful for such cases, as they consider word as a vector of factors. These factors can contain any information about the surface word and use it while translating. The objective of the current work is to handle morphological inflections in Hindi, Marathi, and Malayalam using Factored translation models when translating from English. Statistical MT approaches face the problem of data spar-sity when translating to a morphologically rich language. It is very unlikely for a parallel corpus to contain all morphological forms of words. We propose a solution to generate these unseen morphological forms and inject them into the original training corpus. We propose a simple and effective solution based on enriching the input with various morphological forms of words. We observe that morphology injection improves the quality of translation in terms of both adequacy and fluency. We verify this with experiments on three morphologically rich languages when translating from English. From the detailed evaluations, we observed an order of magnitude improvement in translation quality.
机译:基于短语的统计机器翻译(PBSMT)通常用于自动翻译。但是,当源语言和目标语言中的一种或两种在形态上都丰富时,PBSMT就会遇到困难。发现因式模型对于此类情况很有用,因为他们将单词视为因素的向量。这些因素可以包含有关表面单词的任何信息,并在翻译时使用。当前工作的目标是从英语翻译时使用因子翻译模型来处理北印度语,马拉地语和马拉雅拉姆语中的词形变化。统计MT方法在转换为形态丰富的语言时面临数据稀疏的问题。平行语料库不可能包含单词的所有形态形式。我们提出一种解决方案,以生成这些看不见的形态形式,并将其注入原始的训练语料库中。我们提出了一种简单有效的解决方案,它基于各种词形的丰富输入。我们观察到形态学注入在充分性和流畅性方面均提高了翻译质量。从英语翻译时,我们通过对三种形态丰富的语言的实验进行了验证。通过详细的评估,我们发现翻译质量提高了一个数量级。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号