首页> 外文会议>ACL workshop on statistical machine translation >Using Comparable Corpora to Adapt MT Models to New Domains
【24h】

Using Comparable Corpora to Adapt MT Models to New Domains

机译:使用可比较的Corpora将MT模型调整为新域名

获取原文

摘要

In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores (Irvine et al., 2013a). In this work, we target errors due to inaccurate translation model scores using new-domain comparable corpora, which we mine from Wikipedia. We assume that we have access to a large old-domain parallel training corpus but only enough new-domain parallel data to tune model parameters and do evaluation. We use the new-domain comparable corpora to estimate additional feature scores over the phrase pairs in our baseline models. Augmenting models with the new features improves the quality of machine translations in the medical and science domains by up to 1.3 BLEU points over very strong baselines trained on the 150 million word Canadian Hansard dataset.
机译:在以前的工作中,我们显示使用旧域数据上培训的SMT模型来翻译新域中的文本,大多数错误都是由于看不见的源词,看不见的目标翻译和不准确的翻译模型分数(Irvine等, 2013A)。在这项工作中,我们使用新域型可比性集团的翻译模型分数导致的错误,我们从维基百科迈出了不准确的翻译模型分数。我们假设我们可以访问大型旧域并行培训语料库,但只有足够的新域并行数据来调整模型参数并进行评估。我们使用新域型可比性集团估算基线模型中短语对中的附加功能分数。通过新功能的增强模型可提高医疗和科学域中的机器翻译的质量,最多可以在1.3个Bleu积分培训的1.3个BLEU积分培训的1.3亿字。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号