Preemptive All-reduce Scheduling for Expediting Distributed DNN Training

机译：抢占式全缩减调度，加快分布式DNN培训

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Data-parallel training is widely used for scaling DNN training over large datasets, using the parameter server or all-reduce architecture. Communication scheduling has been promising to accelerate distributed DNN training, which aims to overlap communication with computation by scheduling the order of communication operations. We identify two limitations of previous communication scheduling work. First, layer-wise computation graph has been a common assumption, while modern machine learning frameworks (e.g., TensorFlow) use a sophisticated directed acyclic graph (DAG) representation as the execution model. Second, the default sizes of tensors are often less than optimal for transmission scheduling and bandwidth utilization. We propose PACE, a communication scheduler that preemptively schedules (potentially fused) all-reduce tensors based on the DAG of DNN training, guaranteeing maximal overlapping of communication with computation and high bandwidth utilization. The scheduler contains two integrated modules: given a DAG, we identify the best tensor-preemptive communication schedule that minimizes the training time; exploiting the optimal communication scheduling as an oracle, a dynamic programming approach is developed for generating a good DAG, which merges small communication tensors for efficient bandwidth utilization. Experiments in a GPU testbed show that PACE accelerates training with representative system configurations, achieving up to 36% speed-up compared with state-of-the-art solutions.

机译：数据并行训练被广泛用于使用参数服务器或全约简架构在大型数据集上扩展DNN训练。通信调度已有望加速分布式DNN培训，该培训旨在通过调度通信操作的顺序使通信与计算重叠。我们确定了以前的通信调度工作的两个局限性。首先，逐层计算图是一个普遍的假设，而现代机器学习框架（例如TensorFlow）使用复杂的有向无环图（DAG）表示作为执行模型。其次，张量的默认大小通常小于传输调度和带宽利用的最佳大小。我们提出了一种通信调度程序PACE，该通信调度程序基于DNN训练的DAG抢先调度（可能融合）全归约张量，从而确保通信与计算的最大重叠以及高带宽利用率。调度程序包含两个集成模块：给定一个DAG，我们确定最佳的张量可抢先的通信调度，以最大程度地减少训练时间;利用最佳通信调度作为预言机，开发了一种动态编程方法来生成良好的DAG，该方法合并了较小的通信张量以有效地利用带宽。在GPU测试平台上进行的实验表明，PACE通过具有代表性的系统配置来加快培训速度，与最新解决方案相比，速度提高了36％。

著录项

来源
《IEEE Conference on Computer Communications》|2020年|626-635|共10页
会议地点
作者
Yixin Bao; Yanghua Peng; Yangrui Chen; Chuan Wu;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Tensors; Training; Processor scheduling; Schedules; Bandwidth; Computational modeling; Graphics processing units;

机译：张量;训练;处理器调度;调度;带宽;计算建模;图形处理单元;

相似文献

外文文献
中文文献
专利

1. Preliminary Performance Analysis of Distributed DNN Training with Relaxed Synchronization [J] . Shirahata Koichi, Haderbache Amir, Fukumoto Naoto, IEICE Transactions on Electronics . 2021,第6期

机译：缓解同步分布式DNN训练的初步性能分析
2. Improving the Analysis of Distributed Non-Preemptive FP/DP{sup}* Scheduling with the Trajectory Approach [J] . STEVEN MARTIN, PASCALE MINET Telecommunication systems: Modeling, Analysis, Design and Management . 2005,第1a3期

机译：利用轨迹方法改进对分布式非抢先FP / DP {sup} *调度的分析
3. Evaluating a range of learning schedules: hybrid training schedules may be as good as or better than distributed practice for some tasks (vol 59, pg 276, 2016) [J] . Paik J., Ritter F. E. Ergonomics . 2016,第7期

机译：评估一系列学习时间表：对于某些任务，混合训练时间表可能比分布式练习好或更好（第59卷，第276页，2016年）
4. Transforming Distributed Acyclic Systems into Equivalent Uniprocessors Under Preemptive and Non-Preemptive Scheduling [C] . Praveen Jayachandran, Tarek Abdelzaher Euromicro Conference on Real-Time Systems . 2008

机译：在先发制人和非先发制人的调度下将分布式无循环系统转换为等价的单处理
5. Co-Designing Communication Middleware and Deep Learning Frameworks for High-Performance Dnn Training on Hpc Systems [D] . Awan, Ammar Ahmad. 2020

机译：共同设计通信中间件和HPC系统高性能DNN培训的深度学习框架
6. Evaluating the impact of practice conditions (randomized vs. blocked) and schedule (distributed vs. massed) on script training in aphasia [O] . Leora R. Cherney, Sarel van Vuuren, Rachel Hitch, -1

机译：评估练习条件（随机与否）和时间表（分布式与大规模）对失语症脚本训练的影响
7. Transforming Distributed Acyclic Systems into Equivalent Uniprocessors under Preemptive and Non-Preemptive Scheduling [O] . Praveen Jayachandran, Tarek Abdelzaher 2008

机译：在先发制人和非先发制人的调度下将分布式无循环系统转换为等价的单处理

Preemptive All-reduce Scheduling for Expediting Distributed DNN Training

摘要

著录项

相似文献

相关主题

期刊订阅