BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster

Yang Eunju; Kang Dong-Ki; Youn Chan-Hyun

首页> 外文期刊>Journal of supercomputing >BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster

【24h】

BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster

机译：BOA：批处理编排算法，用于减轻异构GPU集群中分布式DL训练的拖累

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Training deep learning model is a time-consuming job since it usually uses a large amount of data. To reduce the training time, most practitioners train the models in a GPU cluster environment in a distributed way. The synchronous stochastic gradient descent, which is one of the widely used distributed training algorithms, has fast convergence rate with the use of multiple GPU workers; but its speed is tied to the slowest worker, i.e., straggler. In a heterogeneous environment, a static straggler, which has not been mainly focused before, has more impact on the performance than randomly occurring straggler. However, most existing studies for straggler mitigation usually consider a homogeneous environment, so their approaches are limited in practice. In this paper, we scrutinize the straggler problem under heterogeneous environment and define static and dynamic straggler from empirical results. Based on this, we propose a novel approach called batch orchestration algorithm (BOA) for straggler mitigation. It adaptively balances the amount of mini-batch data according to the speed of workers. Therefore, BOA can mitigate both static and dynamic straggler in a modern GPU cluster. BOA uses a Min-Max Integer programming to find the optimal mini-batch size, with the hardware-agnostic performance models. For verification, several experiments are conducted on a cluster having up to six GPUs with three types: GTX 1080, GTX 1060 and Quadro M2000. The results show BOA mitigates both types of stragglers and accelerates the training speed with synchronous SGD compared to other straggler mitigation method.

机译：训练深度学习模型是一项耗时的工作，因为它通常使用大量数据。为了减少训练时间，大多数从业者都在GPU集群环境中以分布式方式训练模型。同步随机梯度下降是广泛使用的分布式训练算法之一，在使用多个GPU工人时具有很快的收敛速度。但是它的速度取决于最慢的工人，即散乱的人。在异构环境中，以前从未主要关注的静态散乱者比随机散乱者对性能的影响更大。但是，大多数现有的减轻流浪者的研究通常考虑的是同质环境，因此它们的方法在实践中受到限制。在本文中，我们仔细研究了异构环境下的散乱者问题，并根据经验结果定义了静态散乱者和动态散乱者。在此基础上，我们提出了一种新的方法，称为批量协调算法（BOA），用于缓解流浪汉。它根据工作人员的速度自适应地平衡小批量数据的数量。因此，BOA可以减轻现代GPU群集中的静态和动态散乱性。 BOA使用Min-Max Integer编程来找到最佳的最小批处理大小，并具有与硬件无关的性能模型。为了进行验证，在具有多达六个GPU的群集上进行了一些实验，这三种GPU分别为GTX 1080，GTX 1060和Quadro M2000。结果表明，与其他缓解方式相比，采用同步SGD的BOA可以同时缓解这两种类型的缓解方式，并加快了训练速度。

著录项

来源
《Journal of supercomputing》 |2020年第1期|47-67|共21页
作者
Yang Eunju; Kang Dong-Ki; Youn Chan-Hyun;
展开▼
作者单位

Korea Adv Inst Sci & Technol Daejeon South Korea;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Batch size; Distributed training; Distributed deep learning; Heterogeneous GPU cluster; Straggler; Synchronous SGD;

机译：批量大小;分布式培训;分布式深度学习;异构GPU集群;流浪者;同步SGD;

相似文献

外文文献
中文文献
专利

1. Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster [J] . Víctor Campos, Francesc Sastre, Maurici Yagües, Procedia Computer Science . 2017,第1期

机译：分布式GPU集群上计算机视觉深度学习算法的分布式训练策略
2. Distributed programming of a hyperspectral image registration algorithm for heterogeneous GPU clusters [J] . Jorge Fernandez-Fabeiro, Arturo Gonzalez-Escribano, Diego R. Llanos Journal of Parallel and Distributed Computing . 2021,第May期

机译：异构GPU集群的高光谱图像登记算法的分布式编程
3. Multi-GPU, Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System [J] . Nature reviews Cancer . 2020,第2期

机译：多均匀分布式系统3D电容断层扫描加速的多GPU，多节点算法
4. An Adaptive Batch-Orchestration Algorithm for the Heterogeneous GPU Cluster Environment in Distributed Deep Learning System [C] . Eunju Yang, Seong-Hwan Kim, Tae-Woo Kim, IEEE International Conference on Big Data and Smart Computing . 2018

机译：分布式深度学习系统中异构GPU集群环境的自适应批处理编排算法
5. Optimization techniques for mapping algorithms and applications onto CUDA GPU platforms and CPU-GPU heterogeneous platforms. [D] . Wu, Jing. 2014

机译：用于将算法和应用程序映射到CUDA GPU平台和CPU-GPU异构平台的优化技术。
6. Multi-GPU Multi-Node Algorithms for Acceleration of Image Reconstruction in 3D Electrical Capacitance Tomography in Heterogeneous Distributed System [O] . Michał Majchrowicz, Paweł Kapusta, Lidia Jackowska-Strumiłło, 2020

机译：异构分布系统中3D电容层析成像中加速图像重建的多GPU多节点算法
7. Distributed training strategies for a computer vision deep learning algorithm on a distributed GPU cluster [O] . Campos, Victor, Sastre, Francesc, Yagües, Maurici, 2017

机译：分布式GpU集群上计算机视觉深度学习算法的分布式训练策略

BOA: batch orchestration algorithm for straggler mitigation of distributed DL training in heterogeneous GPU cluster

摘要

著录项

相似文献

相关主题

期刊订阅