首页> 外文会议>LREC-2012 >A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation
【24h】

A Richly Annotated, Multilingual Parallel Corpus for Hybrid Machine Translation

机译:一种丰富的橡皮,多语种并行语料库,用于混合机器翻译

获取原文

摘要

In recent years, machine translation (MT) research has focused on investigating how hybrid machine translation as well as system combination approaches can be designed so that the resulting hybrid translations show an improvement over the individual "component" translations. As a first step towards achieving this objective we have developed a parallel corpus with source text and the corresponding translation output from a number of machine translation engines, annotated with metadata information, capturing aspects of the translation process performed by the different MT systems. This corpus aims to serve as a basic resource for further research on whether hybrid machine translation algorithms and system combination techniques can benefit from additional (linguistically motivated, decoding, and runtime) information provided by the different systems involved. In this paper, we describe the annotated corpus we have created. We provide an overview on the component MT systems and the XLIFF-based annotation format we have developed. We also report on first experiments with the ML4HMT corpus data.
机译:近年来,机器翻译(MT)研究专注于调查混合机器翻译以及系统组合方法如何设计,使得所产生的混合转换显示各个“组件”翻译的改进。作为实现此目标的第一步,我们开发了一个并行语料库,其中许多机器翻译引擎的源文本和相应的翻译输出,用元数据信息注释,捕获由不同的MT系统执行的翻译过程的方面。该语料库旨在作为进一步研究混合机器翻译算法和系统组合技术是否可以从所涉及的不同系统提供的附加(语言上动机,解码和运行时)信息中受益的进一步研究。在本文中,我们描述了我们创建的注释语料库。我们在组件MT系统和我们开发的基于Xliff的注释格式上提供了概述。我们还报告了ML4HMT语料库数据的第一次实验。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号