首页> 外文会议>International Joint Conference on Artificial Intelligence >Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

【24h】

Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

机译：驯服嘈杂的渐变：用小批量尺寸训练深层神经网络

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Deep learning architectures are usually proposed with millions of parameters, resulting in a memory issue when training deep neural networks with stochastic gradient descent type methods using large batch sizes. However, training with small batch sizes tends to produce low quality solution due to the large variance of stochastic gradients. In this paper, we tackle this problem by proposing a new framework for training deep neural network with small batches/noisy gradient. During optimization, our method iteratively applies a proximal type regularizer to make loss function strongly convex. Such regularizer stablizes the gradient, leading to better training performance. We prove that our algorithm achieves comparable convergence rate as vanilla SGD even with small batch size. Our framework is simple to implement and can be potentially combined with many existing optimization algorithms. Empirical results show that our method outperforms SGD and Adam when batch size is small. Our implementation is available at https://github.com/huiqu18/TRAlgorithm.

机译：深度学习架构通常提出数百万参数，导致使用大批量尺寸的随机梯度下降型方法培训深神经网络时内存问题。然而，由于随机梯度的大方差，小批量尺寸的培训往往会产生低质量的解决方案。在本文中，我们通过提出具有小批量/嘈杂梯度的深神经网络的新框架来解决这个问题。在优化期间，我们的方法迭代地应用近端常规规则器以使损耗功能强烈凸起。此类规范器稳定梯度，导致更好的培训表现。我们证明，即使具有小批量尺寸，我们的算法也可以获得与香草SGD相当的会聚速率。我们的框架易于实施，可以与许多现有优化算法相结合。经验结果表明，当批量大小小时，我们的方法优于SGD和ADAM。我们的实现是在https://github.com/huiqu18/tralgorithm提供的。

著录项

来源
《International Joint Conference on Artificial Intelligence 》|2020年|3649-4375p|共7页
会议地点
作者
Yikai Zhang; Hui Qu; Chao Chen; Dimitris Metaxas;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类 TP18-53;
关键词

相似文献

外文文献
中文文献
专利

1. Gradient amplification: An efficient way to train deep neural networks [J] . Sunitha Basodi, Chunyan Ji, Haiping Zhang, Big Data Mining and Analytics . 2020 ,第3期

机译：梯度放大：培训深神经网络的有效方法
2. Gradient Amplification: An Efficient Way to Train Deep Neural Networks [J] . Sunitha Basodi, Chunyan Ji, Haiping Zhang, 大数据挖掘与分析(英文) . 2020 ,第003期

机译：梯度放大：培训深神经网络的有效方法
3. Noisy training for deep neural networks in speech recognition [J] . Shi Yin, Chao Liu, Zhiyong Zhang, EURASIP journal on audio, speech, and music processing . 2015 ,第1期

机译：用于语音识别的深度神经网络的噪声训练
4. Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes [C] . Yikai Zhang, Hui Qu, Chao Chen, International Joint Conference on Artificial Intelligence . 2020

机译：驯服嘈杂的渐变：用小批量尺寸训练深层神经网络
5. The Effect of the Mini-Batch Size on Deep Neural Networks Training. [D] . Soto, Philippe. 2017

机译：最小批量大小对深度神经网络训练的影响。
6. Convergence of batch gradient learning with smoothing regularization and adaptive momentum for neural networks [O] . Qinwei Fan, Wei Wu, Jacek M. Zurada -1

机译：具有平滑正则化和自适应动量的神经网络批次梯度学习的收敛性
7. Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes [O] . Yikai Zhang, Hui Qu, Chao Chen, 2019

机译：驯服嘈杂的渐变：用小批量尺寸训练深层神经网络

Taming the Noisy Gradient: Train Deep Neural Networks with Small Batch Sizes

摘要

著录项

相似文献

相关主题

期刊订阅