A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

机译：一种新型随机梯度下降算法，基于在分布式深度学习的异构集群系统上进行分组

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and Stale-SGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2~3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar100, and ImageNet, compared to Sync-SGD, Async-SGD, and Stale-SGD.

机译：在异构集群系统上，通过机器的不同性能大大困扰神经网络模型的收敛性能。在本文中，我们提出了一种名为Direct-SGD的分布式随机梯度下降（SGD）算法，用于分布式深度学习，其比Sync-SGD，Async-SGD和Stale-SGD更快地收敛。在分组-SGD中，机器被划分为多个组，确保同一组中的机器具有相似的性能。同一组中的机器同步更新模型，而不同的组则异步更新模型。为了进一步提高分组-SGD的性能，参数服务器从快速排列到慢，并且它们负责分别将从下层从下层更新到更高层的模型参数。实验结果表明，与Sync-SGD，ASYNC-SGD和STALE-SGD相比，分组-SGD可以实现1.2〜3.7倍的加速：MNIST，CIFAR10，CIFAR100和ImageNet。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2019年|xxxiii 709 p. :|共8页
会议地点
作者
Wenbin Jiang; Geyan Ye; Laurence T. Yang; Jian Zhu; Yang Ma; Xia Xie; Hai Jin;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
gradient methods; learning (artificial intelligence); neural nets; pattern clustering; stochastic processes;

机译：梯度方法;学习（人工智能）;神经网;模式聚类;随机过程;

相似文献

外文文献
中文文献
专利

1. A Tensor CP Decomposition Method for Clustering Heterogeneous Information Networks via Stochastic Gradient Descent Algorithms [J] . Wu Jibing, Wang Zhifei, Wu Yahui, Scientific programming . 2017,第PTa1期

机译：基于随机梯度下降算法的异构信息网络的张量CP分解方法
2. Asymptotic Network Independence in Distributed Stochastic Optimization for Machine Learning: Examining Distributed and Centralized Stochastic Gradient Descent [J] . Pu Shi, Olshevsky Alex, Paschalidis Ioannis Ch. IEEE Signal Processing Magazine . 2020,第3期

机译：机器学习分布式随机优化中的渐近网络独立性：检查分布式和集中式随机梯度下降
3. Deep learning for sea cucumber detection using stochastic gradient descent algorithm [J] . Huaqiang Zhang, Fusheng Yu, Jincheng Sun, European Journal of Remote Sensing . 2020,第sup1期

机译：利用随机梯度下降算法对海参检测深度学习
4. A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning [C] . Wenbin Jiang, Geyan Ye, Laurence T. Yang, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：基于异构集群系统分组的分布式随机学习新的随机梯度下降算法
5. Stochastic Gradient Descent for Modern Machine Learning: Theory, Algorithms and Applications [D] . Kidambi, Rahul. 2019

机译：现代机器学习的随机梯度下降：理论，算法和应用
6. Mutual Information Based Learning Rate Decay for Stochastic Gradient Descent Training of Deep Neural Networks [O] . Shrihari Vasudevan 2020

机译：基于互动信息的学习速率衰减用于深神经网络的随机梯度血统训练
7. A DAG Model of Synchronous Stochastic Gradient Descent in Distributed Deep Learning [O] . Shaohuai Shi, Qiang Wang, Xiaowen Chu, 2018

机译：分布式深度学习中同步随机梯度下降的DAG模型

A Novel Stochastic Gradient Descent Algorithm Based on Grouping over Heterogeneous Cluster Systems for Distributed Deep Learning

摘要

著录项

相似文献

相关主题

期刊订阅