首页> 外文期刊>Computational linguistics >Online Learning for Statistical Machine Translation
【24h】

Online Learning for Statistical Machine Translation

机译:在线学习统计机器翻译

获取原文

摘要

We present online learning techniques for statistical machine translation (SMT). The availability of large training data sets that grow constantly over time is becoming more and more frequent in the field of SMT—for example, in the context of translation agencies or the daily translation of government proceedings. When new knowledge is to be incorporated in the SMT models, the use of batch learning techniques require very time-consuming estimation processes over the whole training set that may take days or weeks to be executed. By means of the application of online learning, new training samples can be processed individually in real time. For this purpose, we define a state-of-the-art SMT model composed of a set of submodels, as well as a set of incremental update rules for each of these submodels. To test our techniques, we have studied two well-known SMT applications that can be used in translation agencies: post-editing and interactive machine translation. In both scenarios, the SMT system collaborates with the user to generate high-quality translations. These user-validated translations can be used to extend the SMT models by means of online learning. Empirical results in the two scenarios under consideration show the great impact of frequent updates in the system performance. The time cost of such updates was also measured, comparing the efficiency of a batch learning SMT system with that of an online learning system, showing that online learning is able to work in real time whereas the time cost of batch retraining soon becomes infeasible. Empirical results also showed that the performance of online learning is comparable to that of batch learning. Moreover, the proposed techniques were able to learn from previously estimated models or from scratch. We also propose two new measures to predict the effectiveness of online learning in SMT tasks. The translation system with online learning capabilities presented here is implemented in the open-source Thot toolkit for SMT.
机译:我们介绍了用于统计机器翻译(SMT)的在线学习技术。在SMT领域中,随着时间推移而不断增长的大型培训数据集的可用性正变得越来越频繁,例如在翻译机构或政府程序的日常翻译中。如果要将新知识整合到SMT模型中,则批处理学习技术的使用需要在整个培训集中进行非常耗时的估计过程,这可能需要几天或几周的时间才能执行。通过在线学习的应用,可以实时地单独处理新的训练样本。为此,我们定义了由一组子模型以及每个这些子模型的一组增量更新规则组成的最新SMT模型。为了测试我们的技术,我们研究了可在翻译机构中使用的两个著名的SMT应用程序:后期编辑和交互式机器翻译。在这两种情况下,SMT系统都与用户协作以生成高质量的翻译。这些用户验证的翻译可用于通过在线学习扩展SMT模型。在所考虑的两种情况下的经验结果表明,频繁更新对系统性能的巨大影响。还测量了此类更新的时间成本,将批处理学习SMT系统的效率与在线学习系统的效率进行了比较,表明在线学习能够实时工作,而批量再培训的时间成本很快变得不可行。实证结果还表明,在线学习的性能可与批处理学习相媲美。而且,提出的技术能够从先前估计的模型或从头开始学习。我们还提出了两种新方法来预测SMT任务中在线学习的有效性。此处介绍的具有在线学习功能的翻译系统在SMT的开源Thot工具包中实现。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号