首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Exhaustive Study of Hierarchical AllReduce Patterns for Large Messages Between GPUs
【24h】

Exhaustive Study of Hierarchical AllReduce Patterns for Large Messages Between GPUs

机译:GPU之间的大消息的分层AllReduce模式的详尽研究

获取原文

摘要

Data-parallel distributed deep learning requires an AllReduce operation between all GPUs with message sizes in the order of hundreds of megabytes. The popular implementation of AllReduce for deep learning is the Ring-AllReduce, but this method suffers from latency when using thousands of GPUs. There have been efforts to reduce this latency by combining the ring with more latency-optimal hierarchical methods. In the present work, we consider these hierarchical communication methods as a general hierarchical Ring-AllReduce with a pure Ring-AllReduce on one end and Rabenseifner's algorithm on the other end of the spectrum. We exhaustively test the various combinations of hierarchical partitioning of processes on the ABCI system in Japan on up to 2048 GPUs. We develop a performance model for this generalized hierarchical Ring-AllReduce and show the lower-bound of the effective bandwidth achievable for the hierarchical NCCL communication on thousands of GPUs. Our measurements agree well with our performance model. We also find that the optimal large-scale process hierarchy contains the optimal small-scale process hierarchy so the search space for the optimal communication will be reduced.
机译:数据并行分布式深度学习需要在所有GPU之间进行AllReduce操作,消息大小约为数百兆字节。用于深度学习的流行的AllReduce实现是Ring-AllReduce,但是这种方法在使用数千个GPU时会出现延迟。已经通过将环与更多等待时间最优分层方法相结合来减少该等待时间的努力。在当前的工作中,我们将这些分层通信方法视为通用的分层Ring-AllReduce,在频谱的一端具有纯Ring-AllReduce,在频谱的另一端具有Rabenseifner算法。我们在多达2048个GPU上在日本的ABCI系统上详尽地测试了进程的分层分区的各种组合。我们针对这种通用的分层Ring-AllReduce开发了一个性能模型,并显示了在数千个GPU上进行分层NCCL通信可实现的有效带宽的下限。我们的度量与我们的绩效模型非常吻合。我们还发现,最佳的大型过程层次结构包含最佳的小型过程层次结构,因此将减少用于最佳通信的搜索空间。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号