首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training
【24h】

A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training

机译:大型小批量训练中二阶优化的性能改进方法

获取原文

摘要

Classical learning theory states that when the number of parameters of the model is too large compared to the data, the model will overfit and the generalization performance deteriorates. However, it has been empirically shown that deep neural networks (DNN) can achieve high generalization capability by training with extremely large amount of data and model parameters, which exceeds the predictions of classical learning theory. One drawback of this is that training of DNN requires enormous calculation time. Therefore, it is necessary to reduce the training time through large scale parallelization. Straightforward data-parallelization of DNN degrades convergence and generalization. In the present work, we investigate the possibility of using second order methods to solve this generalization gap in large-batch training. This is motivated by our observation that each mini-batch becomes more statistically stable, and thus the effect of considering the curvature plays a more important role in large-batch training. We have also found that naively adapting the natural gradient method causes the generalization performance to deteriorate further due to the lack of regularization capability. We propose an improved second order method by smoothing the loss function, which allows second-order methods to generalize as well as mini-batch SGD.
机译:古典学习理论指出,当模型的参数数量与数据相比太大时,模型将过度拟合,泛化性能会下降。然而,经验证明,通过训练大量的数据和模型参数,深度神经网络(DNN)可以实现较高的泛化能力,这超出了经典学习理论的预测。这样的缺点之一是DNN的训练需要大量的计算时间。因此,有必要通过大规模并行化来减少训练时间。 DNN的直接数据并行化会降低收敛性和泛化性。在目前的工作中,我们研究了使用二阶方法来解决大批量训练中这种普遍性差距的可能性。这是由于我们的观察结果所致,每个小批量在统计上变得更加稳定,因此考虑曲率的效果在大批量训练中起着更为重要的作用。我们还发现,由于缺乏正则化能力,天真地采用自然梯度方法会导致综合性能进一步下降。我们通过对损失函数进行平滑处理,提出了一种改进的二阶方法,该方法允许二阶方法进行一般化以及小批量SGD。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号