Isomorphic model-based initialization for convolutional neural networks

Zhang Hong; Li Yang; Yang HanqingHe BinZhang Yu

摘要

Modern deep convolutional neural networks(CNNs) are often designed to be scalable, leading to the model family concept. A model family is a large (possibly infinite) collection of related neural network architectures. The isomorphism of a model family refers to the fact that the models within it share the same high-level structure. Meanwhile, the models within the model family are called isomorphic models for each other. Existing weight initialization methods for CNNs use random initialization or data-driven initialization. Even though these methods can perform satisfactory initialization, the isomorphism of model families is rarely explored. This work proposes an isomorphic model-based initialization method (IM Init) for CNNs. It can initialize any network with another well-trained isomorphic model in the same model family. We first formulate the widely used general network structure of CNNs. Then a structural weight transformation is presented to transform the weight between two isomorphic models. Finally, we apply our IM Init to the model down-sampling and up-sampling scenarios and confirm its effectiveness in improving accuracy and convergence speed through experiments on various image classification datasets. In the model down-sampling scenario, IM Init initializes the smaller target model with a larger well-trained source model. It improves the accuracy of RegNet200MF by 1.59 on the CIFAR-100 dataset and 1.9 on the CUB200 dataset. Inversely, IM Init initializes the larger target model with a smaller well-trained source model in the model up-sampling scenario. It significantly speeds up the convergence of RegNet600MF and improves the accuracy by 30.10 under short training schedules. Code will be available.

机译：现代深度卷积神经网络（CNN）通常被设计为可扩展的，从而产生了模型族概念。模型系列是相关神经网络架构的大型（可能是无限）集合。模型族的同构性是指其中的模型共享相同的高级结构。同时，模型系列中的模型相互称为同构模型。现有的CNN权重初始化方法使用随机初始化或数据驱动初始化。尽管这些方法可以执行令人满意的初始化，但很少探索模型族的同构性。本文提出了一种基于同构模型的CNN初始化方法（IM Init）。它可以使用同一模型系列中另一个训练有素的同构模型初始化任何网络。我们首先制定了广泛使用的CNN通用网络结构。然后提出结构权重变换，对两个同构模型之间的权重进行变换。最后，将IM Init应用于模型下采样和上采样场景，并通过在各种图像分类数据集上的实验验证了其在提高准确性和收敛速度方面的有效性。在模型下采样方案中，IM Init 使用较大的训练有素的源模型初始化较小的目标模型。它使RegNet200MF在CIFAR-100数据集上的准确率提高了1.59%，在CUB200数据集上提高了1.9%。相反，IM Init 在模型上采样场景中使用较小的训练有素的源模型初始化较大的目标模型。它显著加快了RegNet600MF的收敛速度，并在较短的训练计划下将准确率提高了30.10%。代码将可用。

Isomorphic model-based initialization for convolutional neural networks

摘要

著录项

引文网络

相关主题

期刊订阅