首页> 外文期刊>Machine translation >Integrating source-language context into phrase-based statistical machine translation
【24h】

Integrating source-language context into phrase-based statistical machine translation

机译:将源语言上下文集成到基于短语的统计机器翻译中

获取原文
获取原文并翻译 | 示例
       

摘要

The translation features typically used in Phrase-Based Statistical Machine Translation (PB-SMT) model dependencies between the source and target phrases, but not among the phrases in the source language themselves. A swathe of research has demonstrated that integrating source context modelling directly into log-linear PB-SMT can positively influence the weighting and selection of target phrases, and thus improve translation quality. In this contribution we present a revised, extended account of our previous work on using a range of contextual features, including lexical features of neighbouring words, supertags, and dependency information. We add a number of novel aspects, including the use of semantic roles as new contextual features in PB-SMT, adding new language pairs, and examining the scalability of our research to larger amounts of training data. While our results are mixed across feature selections, classifier hyperparameters, language pairs, and learning curves, we observe that including contextual features of the source sentence in general produces improvements. The most significant improvements involve the integration of long-distance contextual features, such as dependency relations in combination with part-of-speech tags in Dutch-to-English subtitle translation, the combination of dependency parse and semantic role information in English-to-Dutch parliamentary debate translation, or supertag features in English-to-Chinese translation.
机译:通常在基于短语的统计机器翻译(PB-SMT)模型中使用的翻译功能在源短语和目标短语之间建立了依赖关系,但在源语言本身的短语之间不存在依赖关系。大量研究表明,将源上下文模型直接集成到对数线性PB-SMT中可以对目标短语的加权和选择产生积极影响,从而提高翻译质量。在此文稿中,我们对以前使用一系列上下文特征的工作进行了修订,扩展的说明,其中包括相邻单词的词法特征,超标签和依赖项信息。我们添加了许多新颖的方面,包括在PB-SMT中将语义角色用作新的上下文功能,添加新的语言对以及检查我们的研究对大量培训数据的可扩展性。虽然我们的结果在特征选择,分类器超参数,语言对和学习曲线上混杂不清,但我们观察到,将源句子的上下文特征包括在内通常会产生改进。最显着的改进涉及长距离上下文特征的集成,例如荷兰语到英语字幕翻译中的依赖关系与词性标签的组合,英译汉中的依赖关系解析和语义角色信息的组合荷兰议会辩论翻译,或英汉翻译中的超级标记功能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号