Optimal Bucket Algorithms for Large MPI Collectives on Torus Interconnects

机译：Torus互连上的大型MPI集合的最佳存储桶算法

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Collectives axe an important and frequently used component of MPI. Bucket algorithms, also known as "large vector" algorithms, were introduced in the early 90's and have since evolved as a well known paradigm for large MPI collectives. Many modern day supercomputers such as the IBM Blue Gene and Cray XT are based on torus interconnects that offer a highly scalable interconnection architecture for distributed memory systems. While near optimal algorithms have been developed for torus interconnects in other paradigms, such as spanning trees, bucket algorithms have not been optimally extended to these networks. In this paper, we study the basic "divide, distribute and gather" MPI collectives for bucket algorithms - Allgather, Reduce-scatter and Allreduce - for large messages on torus interconnects.rnWe present bucket-based algorithms for these collectives on bidirectional links. We show that these algorithms are optimal in terms of bandwidth and computation for symmetric torus networks (i.e. when all the dimensions are equal), matching the theoretical lower bounds For an asymmetric torus, our algorithms are asymptotically optimal and converge to the lower bound for large dimension sizes. We also argue that our bucket algorithms are more scalable on multi-cores in comparison to spanning tree algorithms. Previous studies of bucket algorithms on torus interconnects have focused on unidirectional links and have been unable to obtain tight lower bounds and optimal algorithms. We close this gap by providing stronger lower bounds and showing that our bidirectional algorithms can easily be adapted to the unidirectional case, matching our lower bounds in terms of bandwidth and computational complexity.rnWe implement our algorithms on the IBM Blue Gene/P Supercomputer, which has quad-core nodes connected in a 3-dimensional torus, using the low level communication interface. We demonstrate that our algorithms perform within 7-30% of the lower bounds for different MPI collectives. We demonstrate good scaling using multicores. We also demon-rnstrate a factor of 3 to 17 speedup for various collectives in comparison to the latest optimized MPI implementation.

机译：集合体是MPI的重要且经常使用的组件。存储桶算法（也称为“大向量”算法）在90年代初引入，并已发展成为大型MPI集合的众所周知的范例。许多现代超级计算机（例如IBM Blue Gene和Cray XT）都基于环面互连，这些互连为分布式存储系统提供了高度可扩展的互连体系结构。虽然已经为其他范式（例如，生成树）中的环型互连开发了接近最佳的算法，但存储桶算法尚未最佳地扩展到这些网络。在本文中，我们研究了用于桶形算法的基本“划分，分布和收集” MPI集合-Allgather，Reduce-scatter和Allreduce-用于环形互连上的大型消息。我们证明，这些算法在对称环面网络的带宽和计算方面都是最优的（即，当所有维数相等时），与理论下限匹配。对于非对称环面，我们的算法是渐近最优的，并且收敛到大环的下限。尺寸尺寸。我们还认为，与生成树算法相比，我们的存储桶算法在多核上更具可扩展性。先前关于环型互连的桶算法的研究集中在单向链接上，并且无法获得严格的下界和最佳算法。我们通过提供更强的下界并证明我们的双向算法可以轻松地适应单向情况，在带宽和计算复杂性方面匹配我们的下界，来缩小这一差距。我们在IBM Blue Gene / P超级计算机上实现了我们的算法使用低级通信接口将四核节点连接成3维圆环。我们证明了对于不同的MPI集合，我们的算法在下限的7％到30％之间执行。我们展示了使用多核的良好扩展能力。与最新优化的MPI实施相比，我们还证明了各种集体提速的3到17倍。

著录项

来源
《24th ACM international conference on supercomputing 2010》|2010年|p.27-36|共10页
会议地点 Amsterdam(NL);Amsterdam(NL)
作者
Nikhil Jain; Yogish Sabharwal;
展开▼
作者单位

IBM Research - India New Delhi, India;

rnIBM Research - India New Delhi, India;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
MPI; collective; communication; torus network;

机译：MPI；集体;通讯;环网;

相似文献

外文文献
中文文献
专利

1. Efficient and Scalable Routing Algorithms for Collective Communication Operations on 2D All-Port Torus Networks [J] . Kayhan M. imre, Cesur Baransel, Harun Artuner International journal of parallel programming . 2011,第6期

机译：用于2D全端口Torus网络上的集体通信操作的高效且可扩展的路由算法
2. Efficient and Scalable Routing Algorithms for Collective Communication Operations on 2D All-Port Torus Networks [J] . Kayhan M. İmre, Cesur Baransel, Harun Artuner International Journal of Parallel Programming . 2011,第6期

机译：用于2D全端口Torus网络上的集体通信操作的高效且可扩展的路由算法
3. MPI collective algorithm selection and quadtree encoding [J] . Jelena Pjesivac-Grbovic, George Bosilca, Graham E. Fagg, Parallel Computing . 2007,第9期

机译：MPI集体算法选择和四叉树编码
4. Optimal Bucket Algorithms for Large MPI Collectives on Torus Interconnects [C] . Nikhil Jain, Yogish Sabharwal ACM international conference on supercomputing . 2010

机译：Torus互连大MPI集体的最佳桶算法
5. Accelerating MPI collective communications through hierarchical algorithms with flexible inter-node communication and imbalance awareness. [D] . Parsons, Benjamin S. 2015

机译：通过具有灵活的节点间通信和不平衡意识的分层算法来加速MPI集体通信。
6. Optimal vaccination schedule search using genetic algorithm over MPI technology [O] . Cristiano Calonaci, Ferdinando Chiacchio, Francesco Pappalardo 2012

机译：MPI技术上基于遗传算法的最佳疫苗接种时间表搜索
7. BDG-Torus Union Graph -- An Efficient Algorithmically Specialized Parallel Interconnect [O] . Neeraj Suri, Avi Mendelson, D. K. Pradhan 2008

机译：BDG-Torus联合图-高效的算法专业并行互连

Optimal Bucket Algorithms for Large MPI Collectives on Torus Interconnects

摘要

著录项

相似文献

相关主题

期刊订阅