Hybrid data-driven models of machine translation

Declan Groves; Andy Way

首页> 外文期刊>Machine translation >Hybrid data-driven models of machine translation

【24h】

Hybrid data-driven models of machine translation

机译：混合数据驱动的机器翻译模型

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

This paper presents an extended, harmonised account of our previous work on combining subsentential alignments from phrase-based statistical machine translation (SMT) and example-based MT (EBMT) systems to create novel hybrid data-driven systems capable of outperforming the baseline SMT and EBMT systems from which they were derived. In previous work, we demonstrated that while an EBMT system is capable of outperforming a phrase-based SMT (PBSMT) system constructed from freely available resources, a hybrid 'example-based' SMT system incorporating marker chunks and SMT subsentential alignments is capable of outperforming both baseline translation models for French-English translation. In this paper, we show that similar gains are to be had from constructing a hybrid 'statistical' EBMT system. Unlike the previous research, here we use the Europarl training and test sets, which are fast becoming the standard data in the field. On these data sets, while all hybrid 'statistical' EBMT variants still fall short of the quality achieved by the baseline PBSMT system, we show that adding the marker chunks to create a hybrid 'example-based' SMT system outperforms the two baseline systems from which it is derived. Furthermore, we provide further evidence in favour of hybrid systems by adding an SMT target-language model to the EBMT system, and demonstrate that this too has a positive effect on translation quality. We also show that many of the subsentential alignments derived from the Europarl corpus are created by either the PBSMT or the EBMT system, but not by both. In sum, therefore, despite the obvious convergence of the two paradigms, the crucial differences between SMT and EBMT contribute positively to the overall translation quality. The central thesis of this paper is that any researcher who continues to develop an MT system using either of these approaches will benefit further from integrating the advantages of the other model; dogged adherence to one approach will lead to inferior systems being developed.

机译：本文介绍了我们以前的工作的扩展，协调的说明，该工作结合了基于短语的统计机器翻译（SMT）和基于示例的MT（EBMT）系统的实质对齐方式，以创建能够胜过基线SMT和源自它们的EBMT系统。在先前的工作中，我们证明了EBMT系统能够胜过由免费资源构成的基于短语的SMT（PBSMT）系统，而结合了标记块和SMT实质对齐方式的混合“基于示例”的SMT系统却能胜过法语-英语翻译的两个基准翻译模型。在本文中，我们表明构建混合的“统计” EBMT系统将获得类似的收益。与以前的研究不同，这里我们使用Europarl训练和测试集，它们已迅速成为该领域的标准数据。在这些数据集上，尽管所有混合的“统计” EBMT变体仍未达到基线PBSMT系统所达到的质量，但我们显示，添加标记块以创建混合的“基于示例”的SMT系统要优于两个基线系统它是派生的。此外，通过向EBMT系统添加SMT目标语言模型，我们为混合系统提供了进一步的证据，并证明这也对翻译质量产生了积极影响。我们还显示，许多源自Europarl语料库的实体比对都是由PBSMT或EBMT系统创建的，而不是由两者创建的。因此，总而言之，尽管这两种范例明显融合，但SMT和EBMT之间的关键差异对整体翻译质量有积极的贡献。本文的中心论点是，任何继续使用这些方法之一开发MT系统的研究人员都将从集成其他模型的优点中进一步受益。严格遵守一种方法将导致开发劣等系统。

著录项

来源
《Machine translation》 |2005年第4期|p.301-323|共23页
作者
Declan Groves; Andy Way;
展开▼
作者单位

School of Computing, Dublin City University, Dublin 9, Ireland;

展开▼
收录信息美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词
hybrid; example-based MT; statistical MT; statistical language models; convergence; chunk coverage; europarl corpus;

机译：混合;基于实例的MT;统计MT;统计语言模型;收敛性;块覆盖率;欧洲语料库;
入库时间 2022-08-18 00:40:51

相似文献

外文文献
中文文献
专利

1. Augmenting geophysical interpretation of data-driven operational water supply forecast modeling for a western US river using a hybrid machine learning approach [J] . Fleming Sean W., Vesselinov Velimir V., Goodbody Angus G. Journal of Hydrology . 2021,第1期

机译：使用混合机械学习方法增强网络美国河流数据驱动运营供水预测模型的地球物理解读
2. Explicitly Modeling Word Translations in Neural Machine Translation [J] . Han Dong, Li Junhui, Li Yachao, ACM transactions on Asian language information processing . 2020,第1期

机译：在神经机器翻译中显式建模单词翻译
3. Improving Statistical Machine Translation by Adapting Translation Models to Translationese [J] . Gennadi Lembersk, Noam Orda, Shuly Wintne Computational linguistics . 2013,第4期

机译：通过将翻译模型适应翻译语言来改善统计机器翻译
4. A Comparative Evaluation of Data-driven Models in Translation Selection of Machine Translation [C] . Yu-Seop Kim, Jeong-Ho Chang, Byoung-Tak Zhang 19th International Conference on Computational Linguistics Coling 2002 Vol.1 Aug 26-30, 2002 Taipei, Taiwan . 2002

机译：机器翻译翻译选择中数据驱动模型的比较评估
5. A machine-aided approach to generating grammar rules from Japanese source text for use in hybrid and rule-based machine translation systems. [D] . Jones, Sean. 2015

机译：一种从日语源文本生成语法规则的机器辅助方法，用于混合和基于规则的机器翻译系统。
6. Identification of the high-risk area for schistosomiasis transmission in China based on information value and machine learning: a newly data-driven modeling attempt [O] . Yan-Feng Gong, Ling-Qian Zhu, Yin-Long Li, 2021

机译：基于信息价值和机器学习的中国血吸虫病传播高风险区域的鉴定：新数据驱动的建模尝试
7. Hybrid data-driven models of machine translation [O] . Groves Declan 2007

机译：混合数据驱动的机器翻译模型

Hybrid data-driven models of machine translation

摘要

著录项

相似文献

相关主题

期刊订阅