In this paper we show how to train statistical machine translation systems on real-life tasks using only non-parallel monolingual data from two languages. We present a modification of the method shown in (Ravi and Knight, 2011) that is scalable to vocabulary sizes of several thousand words. On the task shown in (Ravi and Knight, 2011) we obtain better results with only 5% of the computational effort when running our method with an n-gram language model. The efficiency improvement of our method allows us to run experiments with vocabulary sizes of around 5,000 words, such as a non-parallel version of the Verbmobil corpus. We also report results using data from the monolingual French and English Gigaword corpora.
展开▼
机译:在本文中,我们展示了如何仅使用来自两种语言的非并行单语种数据来训练统计机器翻译系统来处理现实生活中的任务。我们对(Ravi and Knight,2011)中显示的方法进行了修改,可扩展到数千个单词的词汇量。在(Ravi和Knight,2011)中显示的任务上,当使用n-gram语言模型运行我们的方法时,我们仅用5%的计算工作量即可获得更好的结果。我们方法的效率提高使我们可以进行约5,000个单词的词汇量的实验,例如Verbmobil语料库的非平行版本。我们还使用法语和英语Gigaword语料单语的数据报告结果。
展开▼