A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training

机译：大型小批量训练中二阶优化的性能改进方法

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Classical learning theory states that when the number of parameters of the model is too large compared to the data, the model will overfit and the generalization performance deteriorates. However, it has been empirically shown that deep neural networks (DNN) can achieve high generalization capability by training with extremely large amount of data and model parameters, which exceeds the predictions of classical learning theory. One drawback of this is that training of DNN requires enormous calculation time. Therefore, it is necessary to reduce the training time through large scale parallelization. Straightforward data-parallelization of DNN degrades convergence and generalization. In the present work, we investigate the possibility of using second order methods to solve this generalization gap in large-batch training. This is motivated by our observation that each mini-batch becomes more statistically stable, and thus the effect of considering the curvature plays a more important role in large-batch training. We have also found that naively adapting the natural gradient method causes the generalization performance to deteriorate further due to the lack of regularization capability. We propose an improved second order method by smoothing the loss function, which allows second-order methods to generalize as well as mini-batch SGD.

机译：古典学习理论指出，当模型的参数数量与数据相比太大时，模型将过度拟合，泛化性能会下降。然而，经验证明，通过训练大量的数据和模型参数，深度神经网络（DNN）可以实现较高的泛化能力，这超出了经典学习理论的预测。这样的缺点之一是DNN的训练需要大量的计算时间。因此，有必要通过大规模并行化来减少训练时间。 DNN的直接数据并行化会降低收敛性和泛化性。在目前的工作中，我们研究了使用二阶方法来解决大批量训练中这种普遍性差距的可能性。这是由于我们的观察结果所致，每个小批量在统计上变得更加稳定，因此考虑曲率的效果在大批量训练中起着更为重要的作用。我们还发现，由于缺乏正则化能力，天真地采用自然梯度方法会导致综合性能进一步下降。我们通过对损失函数进行平滑处理，提出了一种改进的二阶方法，该方法允许二阶方法进行一般化以及小批量SGD。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2019年|696-703|共8页
会议地点
作者
Hiroki Naganuma; Rio Yokota;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
data handling; gradient methods; learning (artificial intelligence); neural nets; optimisation;

机译：数据处理;梯度方法;学习（人工智能）;神经网络;优化;

相似文献

外文文献
中文文献
专利

1. Efficient Mini-batch Training for Stochastic Optimization [J] . Mu Li, Tong Zhang, Yuqiang Chen, SIGKDD explorations . 2014,第CDaROM期

机译：高效的小批量培训以进行随机优化
2. Improvement in functional performance with high-speed power training in older adults is optimized in those with the highest training velocity [J] . Sayers Stephen P., Gibson Kyle, Mann J. Bryan European journal of applied physiology . 2016,第11a12期

机译：在训练速度最快的老年人中，通过高速力量训练改善了老年人的功能性能
3. An approach on the implementation of full batch, online and mini-batch learning on a Mamdani based neuro-fuzzy system with center-of-sets defuzzification: Analysis and evaluation about its functionality, performance, and behavior [J] . Sukey Nakasima-López, Juan R. Castro, Mauricio A. Sanchez, PLoS One . 2019,第9期

机译：在基于Mamdani的神经模糊系统上实现全批次，在线和迷你批量学习的方法，其集中式排放式Defuzzzification：分析和评估其功能，性能和行为
4. A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training [C] . Hiroki Naganuma, Rio Yokota IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：大型跨批量培训二阶优化的性能改进方法
5. Environmental Model Accuracy Improvement Framework Using Statistical Techniques and a Novel Training Approach [D] . Matta, Rakesh. 2020

机译：环境模型准确性改进框架使用统计技术和新颖的训练方法
6. An approach on the implementation of full batch, online and mini-batch learning on a Mamdani based neuro-fuzzy system with center-of-sets defuzzification: Analysis and evaluation about its functionality, performance, and behavior [O] . Sukey Nakasima-López, Juan R. Castro, Mauricio A. Sanchez, 2012

机译：在基于Mamdani的神经模糊系统上进行全批处理，在线和小批量学习的方法，该系统具有集中心去模糊化：有关其功能，性能和行为的分析和评估
7. Mini-batch optimization enables training of ODE models on large-scale datasets [O] . Paul Stapor, Leonard Schmiester, Christoph Wierling, 2019

机译：迷你批量优化可以在大型数据集中培训ODE模型
8. Motor Transport Operator Training: An Approach to Preparing Training Managers and Instructors to Design, Conduct, and Evaluate Performance Oriented Training. [R] . Showel, M., Brennan, M. F., Melching, W. H. 1977

机译：汽车运输操作员培训：培训培训经理和教师设计，实施和评估绩效导向培训的方法。

A Performance Improvement Approach for Second-Order Optimization in Large Mini-batch Training

摘要

著录项

相似文献

相关主题

期刊订阅