Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

Teng Ma; George Bosilca; Aurelien Bouteiller; Jack J. Dongarra

首页> 外文期刊>Journal of Parallel and Distributed Computing >Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

【24h】

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

机译：多核/多核平台上的内核辅助和拓扑感知的MPI集合通信

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non-uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective topology and the hardware topologies, as well as the use of single-copy kernel assisted mechanisms. However, on distributed environments, a single level approach cannot encompass the extreme variations not only in bandwidth and latency capabilities, but also in the capability to support duplex communications or operate multiple concurrent copies. This calls for a collaborative approach between multiple layers of collective algorithms, dedicated to extracting the maximum degree of parallelism from the collective algorithm by consolidating the intra- and inter-node communications. In this work, we present HierKNEM, a kernel-assisted topology-aware collective framework, and the mechanisms deployed by this framework to orchestrate the collaboration between multiple layers of collective algorithms. The resulting scheme maximizes the overlap of intra- and inter-node communications. We demonstrate experimentally, by considering three of the most used collective operations (Broadcast, Allgather and Reduction), that (1) this approach is immune to modifications of the underlying process-core binding; (2) it outperforms state-of-art MPI libraries (Open MPI, MPICH2 and MVAPICH2) demonstrating up to a 30x speedup for synthetic benchmarks, and up to a 3x acceleration for a parallel graph application (ASP); (3) it furthermore demonstrates a linear speedup with the increase of the number of cores per compute node, a paramount requirement for scalability on future many-core hardware.

机译：多核群集已成为高性能计算（HPC）系统的最主要形式，它通过非均匀的内存访问和共享的缓存层次结构，对MPI应用程序的性能提出了挑战。通过仔细考虑集体拓扑和硬件拓扑之间的映射以及使用单拷贝内核辅助机制，MPI集体通信的最新进展缓解了深内存层次结构所暴露的性能问题。但是，在分布式环境中，单级方法不仅不能涵盖带宽和等待时间功能的极端变化，还不能包含支持双工通信或操作多个并发副本的能力的极端变化。这要求在多层集合算法之间进行协作，以通过合并节点内和节点间通信来从集合算法中提取最大并行度。在这项工作中，我们介绍了HierKNEM，它是一种内核辅助的拓扑感知集合框架，以及该框架所部署的用于协调多层集合算法之间的协作的机制。所得方案使节点内和节点间通信的重叠最大化。我们通过考虑三个最常用的集体操作（广播，Allgather和Reduction），通过实验证明了：（1）这种方法不受基础过程核心绑定的修改的影响；（2）它的性能优于最新的MPI库（Open MPI，MPICH2和MVAPICH2），对于合成基准而言，它的显示速度最高可提高30倍，对于并行图应用程序（ASP）的显示速度最高可达到3倍；（3）进一步展示了线性加速，随着每个计算节点内核数量的增加，这是未来多核硬件可扩展性的首要要求。

著录项

来源
《Journal of Parallel and Distributed Computing》 |2013年第7期|1000-1010|共11页
作者
Teng Ma; George Bosilca; Aurelien Bouteiller; Jack J. Dongarra;
展开▼
作者单位

University of Tennessee, Knoxville, TN, USA;

University of Tennessee, Knoxville, TN, USA;

University of Tennessee, Knoxville, TN, USA;

University of Tennessee, Knoxville, TN, USA,Oak Ridge National Laboratory, Oak Ridge, TN, USA,University of Manchester, Manchester, UK;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
MPI; Multicore; Cluster; HPC; Collective communication; Hierarchical;

机译：MPI;多核;簇;HPC;集体沟通;分层的;

相似文献

外文文献
中文文献
专利

1. KNEM: A generic and scalable kernel-assisted intra-node MPI communication framework [J] . Brice Goglin, Stephanie Moreaud Journal of Parallel and Distributed Computing . 2013,第2期

机译：KNEM：通用且可扩展的内核辅助节点内MPI通信框架
2. Extending τ-Lop to model concurrent MPI communications in multicore clusters [J] . Juan-Antonio Rico-Gallego, Juan-Carlos Diaz-Martin, Alexey L. Lastovetsky Future generation computer systems . 2016,第auga期

机译：扩展τ-Lop以对多核集群中的并行MPI通信建模
3. Using hybrid MPI and OpenMP programming to optimize communications in parallel loop self-scheduling schemes for multicore PC clusters [J] . Chao-Chin Wu, Lien-Fu Lai, Chao-Tung Yang, Journal of supercomputing . 2012,第1期

机译：使用混合MPI和OpenMP编程来优化多核PC集群的并行循环自调度方案中的通信
4. HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters [C] . Ma Teng, Bosilca George, Bouteiller Aurelien, 2012 IEEE 26th International Parallel and Distributed Processing Symposium . 2012

机译：HierKNEM：用于多核集群上的内核辅助和拓扑感知的集体通信的自适应框架
5. Topology-Aware MPI Communication and Scheduling for High Performance Computing Systems. [D] . Subramoni, Hari. 2013

机译：高性能计算系统的拓扑感知MPI通信和调度。
6. Multicore-Processor Based Software-Defined Communication/Network Platform for Underwater Internet of Things [O] . Chaohui Luo, Biyun Ma, Fangjiong Chen, 2019

机译：基于多核处理器的水下物联网软件定义的通信/网络平台
7. HierKNEM: An Adaptive Framework for Kernel-Assisted and Topology-Aware Collective Communications on Many-core Clusters [O] . Teng Ma, George Bosilca, Aurelien Bouteiller, 2012

机译：HierKNEM：用于多核集群上的内核辅助和拓扑感知的集体通信的自适应框架

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅