首页> 外文期刊>IEICE Transactions on Information and Systems >Integration of Multiple Bilingually-Trained Segmentation Schemes into Statistical Machine Translation
【24h】

Integration of Multiple Bilingually-Trained Segmentation Schemes into Statistical Machine Translation

机译:将多个双语训练的分割方案集成到统计机器翻译中

获取原文
获取原文并翻译 | 示例
       

摘要

This paper proposes an unsupervised word segmentation algorithm that identifies word boundaries in continuous source language text in order to improve the translation quality of statistical machine translation (SMT) approaches. The method can be applied to any language pair in which the source language is unsegmented and the target language segmentation is known. In the first step, an iterative bootstrap method is applied to learn multiple segmentation schemes that are consistent with the phrasal segmentations of an SMT system trained on the resegmented bitext. In the second step, multiple segmentation schemes are integrated into a single SMT system by characterizing the source language side and merging identical translation pairs of differently segmented SMT models. Experimental results translating five Asian languages into English revealed that the proposed method of integrating multiple segmentation schemes outperforms SMT models trained on any of the learned word segmentations and performs comparably to available monolingually built segmentation tools.
机译:为了提高统计机器翻译(SMT)方法的翻译质量,本文提出了一种在连续源语言文本中识别单词边界的无监督分词算法。该方法可以应用于源语言未分段且目标语言分段已知的任何语言对。第一步,应用迭代自举方法来学习多个分段方案,这些方案与在重新分段的bitext上训练的SMT系统的短语分段一致。在第二步中,通过表征源语言侧并合并不同分段SMT模型的相同翻译对,将多个分段方案集成到单个SMT系统中。将五种亚洲语言翻译成英语的实验结果表明,所提出的整合多种分割方案的方法优于在任何学习过的词分割上训练的SMT模型,并且与可用的单语言构建的分割工具具有可比性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号