首页> 外文会议>9th Workshop on statistical machine translation >Using Comparable Corpora to Adapt MT Models to New Domains
【24h】

Using Comparable Corpora to Adapt MT Models to New Domains

机译:使用可比语料库将MT模型适应新域

获取原文
获取原文并翻译 | 示例

摘要

In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores (Irvine et al., 2013a). In this work, we target errors due to inaccurate translation model scores using new-domain comparable corpora, which we mine from Wikipedia. We assume that we have access to a large old-domain parallel training corpus but only enough new-domain parallel data to tune model parameters and do evaluation. We use the new-domain comparable corpora to estimate additional feature scores over the phrase pairs in our baseline models. Augmenting models with the new features improves the quality of machine translations in the medical and science domains by up to 1.3 BLEU points over very strong baselines trained on the 150 million word Canadian Hansard dataset.
机译:在先前的工作中,我们表明,使用在旧域数据上训练的SMT模型在新域中翻译文本时,大多数错误是由于看不见的源词,看不见的目标翻译和不正确的翻译模型得分(Irvine等, 2013a)。在这项工作中,我们使用新域可比语料库(由于我们从Wikipedia挖掘)来针对由于翻译模型得分不准确而导致的错误。我们假设我们可以访问大型的旧域并行训练语料库,但仅具有足够的新域并行数据来调整模型参数和进行评估。我们使用新域可比语料库来估计基线模型中短语对上的其他功能得分。在1.5亿字的加拿大《国会议事录》数据集上训练的非常强大的基准上,具有新功能的增强模型将医学和科学领域的机器翻译质量提高了1.3个BLEU点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号