首页> 外文会议>IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology >Efficient knowledge distillation of teacher model to multiple student models
【24h】

Efficient knowledge distillation of teacher model to multiple student models

机译:从教师模型到多学生模型的有效知识提炼

获取原文

摘要

Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.
机译:深度学习模型在训练输入特征集和不同任务输出之间的复杂非线性关系时,被证明能提供令人满意的结果。然而,它们是内存密集型的,需要良好的计算能力来进行训练和推理。在文献中,人们可以找到不同的模型压缩技术,可以轻松地在边缘设备上部署。知识提取是一种将复杂教师模型的知识转移到低参数学生模型的方法。然而,其局限性在于,为了更好地进行知识转移,学生模型的体系结构应该与复杂的教师模型相比较。由于这一限制,我们无法部署这种学生模型,从边缘设备上的复杂而庞大的教师那里学习。在这项工作中,我们建议使用一种组合的学生方法,其中不同的学生模型从一个共同的教师模型中学习。此外,我们还提出了一个独特的损失函数,可以同时训练多个学生模型。这种方法的一个优点是,与传统的单一学生模型和复杂的教师模型相比,这些学生模型可以尽可能简单。最后,我们提供了一个广泛的评估,以证明我们的方法可以显著提高整体精度,并允许与通用模型相比进一步压缩10%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号