首页> 外文会议>AAAI Conference on Artificial Intelligence >Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students
【24h】

Training Deep Neural Networks in Generations: A More Tolerant Teacher Educates Better Students

机译:在几代人中培训深度神经网络:更宽容的老师教育更好的学生

获取原文

摘要

We focus on the problem of training a deep neural network in generations. The flowchart is that, in order to optimize the target network (student), another network (teacher) with the same architecture is first trained, and used to provide part of supervision signals in the next stage. While this strategy leads to a higher accuracy, many aspects (e.g., why teacher-student optimization helps) still need further explorations. This paper studies this problem from a perspective of controlling the strictness in training the teacher network. Existing approaches mostly used a hard distribution (e.g., one-hot vectors) in training, leading to a strict teacher which itself has a high accuracy, but we argue that the teacher needs to be more tolerant, although this often implies a lower accuracy. The implementation is very easy, with merely an extra loss term added to the teacher network, facilitating a few secondary classes to emerge and complement to the primary class. Consequently, the teacher provides a milder supervision signal (a less peaked distribution), and makes it possible for the student to learn from inter-class similarity and potentially lower the risk of over-fitting. Experiments are performed on standard image classification tasks (CIFAR100 and ILSVRC2012). Although the teacher network behaves less powerful, the students show a persistent ability growth and eventually achieve higher classification accuracies than other competitors. Model ensemble and transfer feature extraction also verify the effectiveness of our approach.
机译:我们专注于培养一代内神经网络的问题。流程图是,为了优化目标网络(学生),首先培训具有相同架构的另一个网络(教师),并用于在下一阶段提供部分监控信号。虽然该策略导致更高的准确性,但很多方面(例如,教师学生优化有助于为何)仍然需要进一步的探索。本文从控制教师网络的严格的角度来研究这个问题。现有方法主要使用训练中的辛苦分销(例如,单热量的载体),导致严格的老师本身具有高精度,但我们认为教师需要更容易宽容,虽然这通常意味着较低的准确性。实施非常简单,仅添加到教师网络中的额外损失术语,促进了一些次级类别来涌现和补充初级类别。因此,老师提供了更温和的监督信号(较少的分布),并使​​学生可以从课堂间相似之处学习,并且可能降低过度拟合的风险。实验是对标准图像分类任务(CIFAR100和ILSVRC2012)进行的实验。虽然教师网络的表现不那么强大,但学生们表现出持续的能力增长,最终实现比其他竞争对手更高的分类准确性。模型集合和传输功能提取也验证了我们方法的有效性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号