首页> 外文会议>International Joint Conference on Artificial Intelligence >Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes
【24h】

Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

机译:驯服嘈杂的渐变:用小批量尺寸训练深层神经网络

获取原文

摘要

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm.
机译:深度学习架构通常提出数百万参数,导致使用大批量尺寸的随机梯度下降型方法培训深神经网络时内存问题。然而,由于随机梯度的大方差,小批量尺寸的培训往往会产生低质量的解决方案。在本文中,我们通过提出具有小批量/嘈杂梯度的深神经网络的新框架来解决这个问题。在优化期间,我们的方法迭代地应用近端常规规则器以使损耗功能强烈凸起。此类规范器稳定梯度,导致更好的培训表现。我们证明,即使具有小批量尺寸,我们的算法也可以获得与香草SGD相当的会聚速率。我们的框架易于实施,可以与许多现有优化算法相结合。经验结果表明,当批量大小小时,我们的方法优于SGD和ADAM。我们的实现是在https://github.com/huiqu18/tralgorithm提供的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号