【24h】

Selective Knowledge Distillation for Neural Machine Translation

机译:神经电机翻译选择性知识蒸馏

获取原文

摘要

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research Held in NMT. knowledge distillation is widely applied to enhance the model's performance by transferring teacher model's knowledge on each training sample. However, previous work rarely discusses the different impacts and connections among these samples, which serve as the medium for transferring teacher knowledge. In this paper, we design a novel protocol that can effectively analyze the different impacts of samples by comparing various samples' partitions. Based on above protocol, we conduct extensive experiments and find that the teacher's knowledge is not the more, the better. Knowledge over specific samples may even hurt the whole performance of knowledge distillation. Finally, to address these issues, we propose two simple yet effective strategies, i.e., batch-level and global-level selections, to pick suitable samples for distillation. We evaluate our approaches on two large-scale machine translation tasks, WMT'14 English-German and WMT'19 Chinese-English. Experimental results show that our approaches yield up to + 1.28 and +0.89 BLEU points improvements over the Transformer baseline, respectively.
机译:神经机翻译(NMT)模型在许多翻译基准上实现最先进的性能。作为NMT举行的积极研究。知识蒸馏被广泛应用于通过转移教师模型对每个训练样本的知识来提高模型的性能。然而,以前的作品很少讨论这些样本之间的不同影响和连接,其作为转移教师知识的媒介。在本文中,我们设计了一种新的协议,可以通过比较各种样本的分区有效地分析样本的不同影响。基于上述协议,我们进行广泛的实验,并发现老师的知识并不是更好的。对特定样本的知识甚至可能会损害知识蒸馏的整体性能。最后,为了解决这些问题,我们提出了两种简单但有效的策略,即批量级和全球级别选择,以挑选适当的样品进行蒸馏。我们评估了我们对两种大型机器翻译任务的方法,WMT'14英语 - 德语和WMT'19中英文。实验结果表明,我们的方法分别在变压器基线上提高了高达+ 1.28和+0.89的BLEU点。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号