首页> 外文期刊>Neurocomputing >Preparing lessons: Improve knowledge distillation with better supervision
【24h】

Preparing lessons: Improve knowledge distillation with better supervision

机译:准备课程:提高知识蒸馏,更好的监督

获取原文
获取原文并翻译 | 示例

摘要

Knowledge distillation (KD) is widely applied in the training of efficient neural network. A compact model, which is trained to mimic the representation of a cumbersome model for the same task, generally obtains a better performance compared with being trained with the ground truth label. Previous KDbased works mainly focus on two aspects: (1) designing various feature representation for knowledge transfer; (2) introducing different training mechanism such as progressive learning or adversarial learning. In this paper, we revisit the standard KD and observe that training with teacher & rsquo;s logits might suffer from incorrect and uncertain supervision. To tackle these problems, we propose two novel approaches to deal with incorrect logits and uncertain logits respectively, which are called Logits Adjustment (LA) and Dynamic Temperature Distillation (DTD). To be specific, LA rectifies the incorrect logits according to ground truth label and certain rules. While DTD treats the temperature of KD as a dynamic sample wise parameter rather than a static and global hyper-parameter, which actually notes the uncertainty for each sample & rsquo;s logits. With iteratively updating the sample wise temperature, the student model could pay more attention on the samples that confuse the teacher model. Experiments on CIFAR-10/100, CINIC10 and Tiny ImageNet verify that the proposed methods yield encouraging improvement compared with the standard KD. Furthermore, considering the simple implementations, LA and DTD can be easily attached to many KD-based frameworks and bring improvements without extra cost of training time and computing resources. (c) 2021 Published by Elsevier B.V.
机译:知识蒸馏(KD)广泛应用于高效神经网络的培训。一种紧凑的模型,训练以模仿同一任务的麻烦模型的表示,通常与与地面真理标签接受训练相比的更好的性能。以前的KDBASED作品主要关注两个方面:(1)为知识转移设计各种特征表示; (2)介绍逐步学习或对抗学习等不同培训机制。在本文中,我们重新审视标准KD并观察与教师&rsquo的培训; S Logits可能会受到不正确和不确定的监督。为了解决这些问题,我们提出了两种新方法来处理不正确的登记和不确定的Logits,这些方法称为Logits调整(LA)和动态温度蒸馏(DTD)。具体而言,LA根据地面真理标签和某些规则纠正错误的登录。虽然DTD将KD的温度视为动态样本明智参数而不是静态和全球超参数,其实际上指出了每个样本和RSQUO的不确定性。通过迭代地更新样本明智的温度,学生模型可以更多地关注与教师模型混淆的样本。 CIFAR-10/100的实验,CIC10和微小想象成验证了所提出的方法促进与标准KD相比的促进改进。此外,考虑到简单的实现,LA和DTD可以很容易地附加到基于KD的框架,并在没有额外的培训时间和计算资源的情况下提高改进。 (c)2021由elsevier b.v发布。

著录项

  • 来源
    《Neurocomputing》 |2021年第24期|25-33|共9页
  • 作者单位

    Xian Jiaotng Univ Sch Informat & Commun Engn Minist Educ Key Lab Intelligent Networks & Network Secur Xianning West Rd 28 Xian 710049 Shaanxi Peoples R China;

    Xian Jiaotng Univ Sch Informat & Commun Engn Minist Educ Key Lab Intelligent Networks & Network Secur Xianning West Rd 28 Xian 710049 Shaanxi Peoples R China;

    Xian Jiaotng Univ Sch Informat & Commun Engn Minist Educ Key Lab Intelligent Networks & Network Secur Xianning West Rd 28 Xian 710049 Shaanxi Peoples R China;

  • 收录信息 美国《科学引文索引》(SCI);美国《工程索引》(EI);
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Knowledge distillation; Label regularization; Hard example mining;

    机译:知识蒸馏;标签正则化;硬示例挖掘;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号