首页> 外文OA文献 >Real-world use of pivot languages to translate low-resource languages
【2h】

Real-world use of pivot languages to translate low-resource languages

机译:实际使用枢轴语言来翻译资源匮乏的语言

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Triangulation refers to the use of a pivot language when translating from a source language to a target language. Previous research in triangulation has only focused on large corpora in the same domain. This thesis conducts the first in-depth study on the use of triangulation for four real-world low-resource languages with realistic data settings, Mawukakan, Maninkakan, Haitian Kreyol and Malagasy, where fluent translations using statistical machine translation are difficult to obtain due to limited amounts of training data in the source-target language pair. We compare and contrast several design choices one needs to consider when using triangulation. We observe that triangulation via French improves translations significantly for Mawukakan and Maninkakan, two languages spoken in West Africa. We also improve translations for real-world short messages sent in the aftermath of the Haiti earthquake in 2010 and news articles in Malagasy. As part of the dissertation, we build the first effective translation system for the first two of these languages and outperform the state-of-the-art for Haitian Kreyol. We improve translation quality by injecting more data via pivot languages and show that in realistic data settings carefully considering triangulation design options is important. Furthermore, in all four languages since the low-resource language pair and pivot language pair data typically come from very different domains, we propose a novel iterative method to fine-tune the weighted mixture of direct and pivot based phrase pairs to significantly improve translation quality.
机译:三角剖分是指从源语言转换为目标语言时使用枢轴语言。以前的三角剖分研究仅集中于同一领域中的大型语料库。本文对具有真实数据设置的四种现实世界中的低资源语言进行了三角测量的首次深入研究,它们是Mawukakan,Maninkakan,Haitian Kreyol和Malagasy,在这些语言中,难以获得使用统计机器翻译的有效翻译源-目标语言对中的培训数据数量有限。我们比较和对比了使用三角剖分时需要考虑的几种设计选择。我们观察到,通过法语进行的三角剖分显着改善了西非使用的两种语言的Mawukakan和Maninkakan的翻译。我们还改进了2010年海地地震后发送的真实世界短消息和马达加斯加新闻报道的翻译。作为论文的一部分,我们为其中的前两种语言构建了第一个有效的翻译系统,并超越了海地克雷约尔的最新技术水平。我们通过使用枢轴语言注入更多数据来提高翻译质量,并表明在实际数据设置中,仔细考虑三角剖分设计选项非常重要。此外,在所有四种语言中,由于低资源语言对和枢轴语言对数据通常来自非常不同的域,我们提出了一种新颖的迭代方法来微调基于直接和枢轴的短语对的加权混合,以显着提高翻译质量。

著录项

  • 作者

    Dholakia Rohit;

  • 作者单位
  • 年度 2014
  • 总页数
  • 原文格式 PDF
  • 正文语种
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号