Using Comparable Corpora to Adapt MT Models to New Domains

机译：使用可比较的Corpora将MT模型调整为新域名

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

In previous work we showed that when using an SMT model trained on old-domain data to translate text in a new-domain, most errors are due to unseen source words, unseen target translations, and inaccurate translation model scores (Irvine et al., 2013a). In this work, we target errors due to inaccurate translation model scores using new-domain comparable corpora, which we mine from Wikipedia. We assume that we have access to a large old-domain parallel training corpus but only enough new-domain parallel data to tune model parameters and do evaluation. We use the new-domain comparable corpora to estimate additional feature scores over the phrase pairs in our baseline models. Augmenting models with the new features improves the quality of machine translations in the medical and science domains by up to 1.3 BLEU points over very strong baselines trained on the 150 million word Canadian Hansard dataset.

机译：在以前的工作中，我们显示使用旧域数据上培训的SMT模型来翻译新域中的文本，大多数错误都是由于看不见的源词，看不见的目标翻译和不准确的翻译模型分数（Irvine等， 2013A）。在这项工作中，我们使用新域型可比性集团的翻译模型分数导致的错误，我们从维基百科迈出了不准确的翻译模型分数。我们假设我们可以访问大型旧域并行培训语料库，但只有足够的新域并行数据来调整模型参数并进行评估。我们使用新域型可比性集团估算基线模型中短语对中的附加功能分数。通过新功能的增强模型可提高医疗和科学域中的机器翻译的质量，最多可以在1.3个Bleu积分培训的1.3个BLEU积分培训的1.3亿字。

著录项

来源
《ACL workshop on statistical machine translation》|2014年||共8页
会议地点
作者
Ann Irvine; Chris Callison-Burch;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类程序设计、软件工程;
关键词

相似文献

外文文献
中文文献
专利

1. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulić, Wim De Smet, Marie-Francine Moens Information Retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
2. Cross-language information retrieval models based on latent topic models trained with document-aligned comparable corpora [J] . Ivan Vulic, Wim De Smet, Marie-Francine Moens Information retrieval . 2013,第3期

机译：基于潜在主题模型的跨语言信息检索模型，该主题模型经过与文档对齐的可比语料库训练
3. Extracting translations from comparable corpora for Cross-Language Information Retrieval using the language modeling framework [J] . Razieh Rahimi, Azadeh Shakery, Irwin King Information Processing & Management . 2016,第2期

机译：使用语言建模框架从可比较的语料库中提取翻译以进行跨语言信息检索
4. Using Comparable Corpora to Adapt MT Models to New Domains [C] . Ann Irvine, Chris Callison-Burch 9th Workshop on statistical machine translation . 2014

机译：使用可比语料库将MT模型适应新域
5. Domain Adaptive Computational Models for Computer Vision. [D] . Demakethepalli Venkateswara, Hemanth Kumar. 2017

机译：计算机视觉的领域自适应计算模型。
6. Comparability of dental implant site ridge measurements using ultra-low-dose multidetector row computed tomography combined with filtered back-projection adaptive statistical iterative reconstruction and model-based iterative reconstruction [O] . Asma’a Abdurrahman Al-Ekrish, Reema Al-Shawaf, Wafa Alfaleh, -1

机译：使用超低剂量多探测器行计算机断层摄影技术结合过滤后的投影自适应统计迭代重建和基于模型的迭代重建对种植牙部位进行测量的可比性
7. Using Comparable Corpora to Adapt MT Models to New Domains [O] . Ann Irvine, Chris Callison-burch 2014

机译：使用Comparable Corpora将mT模型适应新域
8. Statistical Word-Level Translation Model for Comparable Corpora [R] . Diab, M. , Finch, S. 2000

机译：可比公司的统计词级翻译模型

Using Comparable Corpora to Adapt MT Models to New Domains

摘要

著录项

相似文献

相关主题

期刊订阅