Efficient knowledge distillation of teacher model to multiple student models

机译：从教师模型到多学生模型的有效知识提炼

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Deep learning models are proven to deliver satisfactory results on training a complex non-linear relationship between the set of input features and different task outputs. However, they are memory intensive and require good computational power for both training as well as inferencing. In literature one can find different model compression techniques which enables easy deployment on edge devices. Knowledge distillation is one such approach where the knowledge of complex teacher model is transferred to a lower parameter student model. However, the limitation is that the architecture of the student model should be comparable to the complex teacher model for better knowledge transfer. Due to this limitation, we cannot deploy this student model that learns from a complex and huge teacher on edge devices. In this work, we propose to use a combined student approach wherein different student models learn from a common teacher model. Further, we propose a unique loss function which will train multiple student models simultaneously. An advantage of this approach is that these student models can be as simple as possible when compared with traditional single student model and also the complex teacher model. Finally, we provide an extensive evaluation to prove that our approach can improve the overall accuracy significantly and allow a further compression by 10% when compared with generic model.

机译：深度学习模型在训练输入特征集和不同任务输出之间的复杂非线性关系时，被证明能提供令人满意的结果。然而，它们是内存密集型的，需要良好的计算能力来进行训练和推理。在文献中，人们可以找到不同的模型压缩技术，可以轻松地在边缘设备上部署。知识提取是一种将复杂教师模型的知识转移到低参数学生模型的方法。然而，其局限性在于，为了更好地进行知识转移，学生模型的体系结构应该与复杂的教师模型相比较。由于这一限制，我们无法部署这种学生模型，从边缘设备上的复杂而庞大的教师那里学习。在这项工作中，我们建议使用一种组合的学生方法，其中不同的学生模型从一个共同的教师模型中学习。此外，我们还提出了一个独特的损失函数，可以同时训练多个学生模型。这种方法的一个优点是，与传统的单一学生模型和复杂的教师模型相比，这些学生模型可以尽可能简单。最后，我们提供了一个广泛的评估，以证明我们的方法可以显著提高整体精度，并允许与通用模型相比进一步压缩10%。

著录项

来源
《IEEE International Conference on Industry 4.0, Artificial Intelligence, and Communications Technology》|2021年|173-179|共7页
会议地点
作者
Thrivikram GL; Vidya Ganesh; T V Sethuraman; Satheesh K. Perepu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Training; Performance evaluation; Knowledge engineering; Deep learning; Computational modeling; Memory management; Object recognition;

机译：训练绩效评估;知识工程;深度学习;计算建模;内存管理;物体识别;

相似文献

外文文献
中文文献
专利

1. Does knowledge of physical activity recommendations increase physical activity among Chinese college students?Empirical investigations based on the transtheoretical model [J] . Kahar Abula, Peter Gröpel, Kai Chen, 运动与健康科学（英文版） . 2018,第001期
2. Joint structured pruning and dense knowledge distillation for efficient transformer model compression [J] . Cui Baiyun, Li Yingming, Zhang Zhongfei Neurocomputing . 2021,第Octa11期

机译：高效变压器模型压缩的联合结构修剪和密集知识蒸馏
3. Teachers and Diverse Students: A Knowledge-to-Action Reader Response Model to Promote Critical Consciousness [J] . Mary Amanda Stewart, Patricia Flint, Mariannella Nunez Multicultural perspectives . 2021,第2期

机译：教师和多元化学生：促进批判意识的知识读者响应模型
4. An empirical study on the impact of "fragmented knowledge construction model" on college students' English learning anxiety and teachers' burnout [J] . Su Zheqian, Liu Miao, Jiang Man Basic & clinical pharmacology & toxicology. . 2019,第S1期

机译：“分散知识建设模式”对大学生英语学习焦虑和教师倦怠的影响的实证研究
5. Modeling Teacher-Student Techniques in Deep Neural Networks for Knowledge Distillation [C] . Sajjad Abbasi, Mohsen Hajabdollahi, Nader Karimi, International Conference on Machine Vision and Image Processing . 2020

机译：在深层神经网络中建模师生技术以进行知识蒸馏
6. Effect of the science teaching advancement through modeling physical science professional development workshop on teachers' attitudes, beliefs and content knowledge and students' content knowledge. [D] . Dietz, Laura. 2014

机译：通过模拟物理专业发展研讨会，科学教学的进步对教师的态度，信念和内容知识以及学生的内容知识的影响。
7. Deep Unsupervised Hashing for Large-Scale Cross-Modal Retrieval Using Knowledge Distillation Model [O] . Mingyong Li, Qiqi Li, Lirong Tang, 2021

机译：使用知识蒸馏模型进行大规模交叉模态检索的深度无监督散列
8. Neural Networks Are More Productive Teachers Than Human Raters: Active Mixup for Data-Efficient Knowledge Distillation From a Blackbox Model [O] . Dongdong Wang, Yandong Li, Liqiang Wang, 2020

机译：神经网络比人类评估者更具生产力的教师：来自黑箱模型的数据有效知识蒸馏的主动混合
9. Modeling Student Knowledge with Self-Organizing Feature Maps [R] . Harp, S. A., Samad, T., Villano, M. 1993

机译：用自组织特征映射建模学生知识

Efficient knowledge distillation of teacher model to multiple student models

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅