首页> 外文会议>3rd Workshop on hybrid approaches to translation >Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system
【24h】

Comparing CRF and template-matching in phrasing tasks within a Hybrid MT system

机译:混合MT系统中短语任务中CRF和模板匹配的比较

获取原文
获取原文并翻译 | 示例

摘要

The present article focuses on improving the performance of a hybrid Machine Translation (MT) system, namely PRE-SEMT. The PRESEMT methodology is readily portable to new language pairs, and allows the creation of MT systems with minimal reliance on expensive resources. PRESEMT is phrase-based and uses a small parallel corpus from which to extract structural transformations from the source language (SL) to the target language (TL). On the other hand, the TL language model is extracted from large monolingual corpora. This article examines the task of maximising the amount of information extracted from a very limited parallel corpus. Hence, emphasis is placed on the module that learns to segment into phrases arbitrary input text in SL, by extrapolating information from a limited-size parsed TL text, alleviating the need for an SL parser. An established method based on Conditional Random Fields (CRF) is compared here to a much simpler template-matching algorithm to determine the most suitable approach for extracting an accurate model. Experimental results indicate that for a limited-size training set, template-matching generates a superior model leading to higher quality translations.
机译:本文的重点是提高混合机器翻译(MT)系统即PRE-SEMT的性能。 PRESEMT方法易于移植到新的语言对,并允许创建MT系统而对昂贵资源的依赖最小。 PRESEMT是基于短语的,并使用一个小的并行语料库从中提取从源语言(SL)到目标语言(TL)的结构转换。另一方面,TL语言模型是从大型单语语料库中提取的。本文研究了最大化从非常有限的并行语料库中提取的信息量的任务。因此,重点放在通过从有限大小的已解析TL文本中推断信息来减轻对SL解析器的需求的模块,该模块学习将SL中的任意输入文本分成短语。在此,将基于条件随机场(CRF)的已建立方法与更简单的模板匹配算法进行比较,以确定最合适的方法来提取精确模型。实验结果表明,对于有限大小的训练集,模板匹配会生成可导致更高质量翻译的上乘模型。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号