Selective Knowledge Distillation for Neural Machine Translation

机译：神经电机翻译选择性知识蒸馏

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Neural Machine Translation (NMT) models achieve state-of-the-art performance on many translation benchmarks. As an active research Held in NMT. knowledge distillation is widely applied to enhance the model's performance by transferring teacher model's knowledge on each training sample. However, previous work rarely discusses the different impacts and connections among these samples, which serve as the medium for transferring teacher knowledge. In this paper, we design a novel protocol that can effectively analyze the different impacts of samples by comparing various samples' partitions. Based on above protocol, we conduct extensive experiments and find that the teacher's knowledge is not the more, the better. Knowledge over specific samples may even hurt the whole performance of knowledge distillation. Finally, to address these issues, we propose two simple yet effective strategies, i.e., batch-level and global-level selections, to pick suitable samples for distillation. We evaluate our approaches on two large-scale machine translation tasks, WMT'14 English-German and WMT'19 Chinese-English. Experimental results show that our approaches yield up to + 1.28 and +0.89 BLEU points improvements over the Transformer baseline, respectively.

机译：神经机翻译（NMT）模型在许多翻译基准上实现最先进的性能。作为NMT举行的积极研究。知识蒸馏被广泛应用于通过转移教师模型对每个训练样本的知识来提高模型的性能。然而，以前的作品很少讨论这些样本之间的不同影响和连接，其作为转移教师知识的媒介。在本文中，我们设计了一种新的协议，可以通过比较各种样本的分区有效地分析样本的不同影响。基于上述协议，我们进行广泛的实验，并发现老师的知识并不是更好的。对特定样本的知识甚至可能会损害知识蒸馏的整体性能。最后，为了解决这些问题，我们提出了两种简单但有效的策略，即批量级和全球级别选择，以挑选适当的样品进行蒸馏。我们评估了我们对两种大型机器翻译任务的方法，WMT'14英语 - 德语和WMT'19中英文。实验结果表明，我们的方法分别在变压器基线上提高了高达+ 1.28和+0.89的BLEU点。

著录项

来源
《International Joint Conference on Natural Language Processing;Annual Meeting of the Association for Computational Linguistics 》|2021年|6456-6466|共11页
会议地点
作者
Fusheng Wang; Jianhao Yan; Fandong Meng; Jie Zhou;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Incorporating Statistical Machine Translation Word Knowledge Into Neural Machine Translation [J] . Xing Wang, Zhaopeng Tu, Min Zhang Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018 ,第12期

机译：将统计机器翻译单词知识整合到神经机器翻译中
2. Linguistic knowledge-based vocabularies for Neural Machine Translation [J] . Noe Casas, Marta R. Costa-jussa, Jose A. R. Fonollosa, Natural language engineering . 2021 ,第Pta4期

机译：基于语言知识的神经机翻译词汇表
3. Linguistic Knowledge-Aware Neural Machine Translation [J] . Qiang Li, Derek F. Wong, Lidia S. Chao, Audio, Speech, and Language Processing, IEEE/ACM Transactions on . 2018 ,第12期

机译：语言知识感知神经机器翻译
4. Collective Wisdom: Improving Low-resource Neural Machine Translation using Adaptive Knowledge Distillation [C] . Fahimeh Saleh, Wray Buntine, Gholamreza Haffari International Conference on Computational Linguistics . 2020

机译：集体智慧：改善使用自适应知识蒸馏的低资源神经电机翻译
5. Knowledge Distillation Circumvents Nonlinearity for Optical Convolutional Neural Networks [D] . Xiang, Jinlin. 2021

机译：知识蒸馏避免光学卷积神经网络的非线性
6. Machine Learning-Based Fast Banknote Serial Number Recognition Using Knowledge Distillation and Bayesian Optimization [O] . Eunjeong Choi, Somi Chae, Jeongtae Kim 2019

机译：基于知识蒸馏和贝叶斯优化的基于机器学习的快速钞票序列号识别
7. Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation [O] . Haipeng Sun, Rui Wang, Kehai Chen, 2020

机译：多语种无监督神经机翻译知识蒸馏

Selective Knowledge Distillation for Neural Machine Translation

摘要

著录项

相似文献

相关主题

期刊订阅