Overfitting frequently occurs in deep learning.In this paper,we propose a novel regu-larization method called drop-activation to reduce overfitting and improve generalization.The key idea is to drop nonlinear activation functions by setting them to be identity func-tions randomly during training time.During testing,we use a deterministic network with a new activation function to encode the average effect of dropping activations randomly.Our theoretical analyses support the regularization effect of drop-activation as implicit parameter reduction and verify its capability to be used together with batch normalization(Ioffe and Szegedy in Batch normalization:accelerating deep network training by reducing internal covariate shift.arXiv:1502.03167,2015).The experimental results on CIFAR10,CIFAR100,SVHN,EMNIST,and ImageNet show that drop-activation generally improves the performance of popular neural network architectures for the image classification task.Furthermore,as a regularizer drop-activation can be used in harmony with standard train-ing and regularization techniques such as batch normalization and AutoAugment(Cubuk et al.in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition,pp.113-123,2019).The code is available at .
展开▼