首页> 外文会议>Second workshop on hybrid approaches to translation 2013 >A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation
【24h】

A Hybrid Word Alignment Model for Phrase-Based Statistical Machine Translation

机译:基于短语的统计机器翻译的混合词对齐模型

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a hybrid word alignment model for Phrase-Based Statistical Machine translation (PB-SMT). The proposed hybrid alignment model provides most informative alignment links which are offered by both un-supervised and semi-supervised word alignment models. Two unsupervised word alignment models (GIZA++ and Berkeley aligner) and a rule based aligner are combined together. The rule based aligner only aligns named entities (NEs) and chunks. The NEs are aligned through transliteration using a joint source-channel model. Chunks are aligned employing a bootstrapping approach by translating the source chunks into the target language using a baseline PB-SMT system and subsequently validating the target chunks using a fuzzy matching technique against the target corpus. All the experiments are carried out after single-tokenizing the multi-word NEs. Our best system provided significant improvements over the baseline as measured by BLEU.
机译:本文提出了一种基于短语的统计机器翻译(PB-SMT)的混合词对齐模型。提出的混合对齐模型提供了大多数信息对齐链接,这些链接由无监督和半监督的单词对齐模型提供。两种无监督的单词对齐模型(GIZA ++和Berkeley对齐器)和基于规则的对齐器组合在一起。基于规则的对齐器仅对齐命名实体(NE)和块。通过使用联合源通道模型的音译来对齐NE。通过使用基线PB-SMT系统将源块翻译成目标语言,然后使用针对目标语料库的模糊匹配技术来验证目标块,采用自举方法对齐块。所有的实验都是在单词化多词网元之后进行的。我们的最佳系统在BLEU的基础上提供了超过基线的显着改进。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号