...
首页> 外文期刊>Journal of Parallel and Distributed Computing >Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms
【24h】

Kernel-assisted and topology-aware MPI collective communications on multicore/many-core platforms

机译:多核/多核平台上的内核辅助和拓扑感知的MPI集合通信

获取原文
获取原文并翻译 | 示例
           

摘要

Multicore Clusters, which have become the most prominent form of High Performance Computing (HPC) systems, challenge the performance of MPI applications with non-uniform memory accesses and shared cache hierarchies. Recent advances in MPI collective communications have alleviated the performance issue exposed by deep memory hierarchies by carefully considering the mapping between the collective topology and the hardware topologies, as well as the use of single-copy kernel assisted mechanisms. However, on distributed environments, a single level approach cannot encompass the extreme variations not only in bandwidth and latency capabilities, but also in the capability to support duplex communications or operate multiple concurrent copies. This calls for a collaborative approach between multiple layers of collective algorithms, dedicated to extracting the maximum degree of parallelism from the collective algorithm by consolidating the intra- and inter-node communications. In this work, we present HierKNEM, a kernel-assisted topology-aware collective framework, and the mechanisms deployed by this framework to orchestrate the collaboration between multiple layers of collective algorithms. The resulting scheme maximizes the overlap of intra- and inter-node communications. We demonstrate experimentally, by considering three of the most used collective operations (Broadcast, Allgather and Reduction), that (1) this approach is immune to modifications of the underlying process-core binding; (2) it outperforms state-of-art MPI libraries (Open MPI, MPICH2 and MVAPICH2) demonstrating up to a 30x speedup for synthetic benchmarks, and up to a 3x acceleration for a parallel graph application (ASP); (3) it furthermore demonstrates a linear speedup with the increase of the number of cores per compute node, a paramount requirement for scalability on future many-core hardware.
机译:多核群集已成为高性能计算(HPC)系统的最主要形式,它通过非均匀的内存访问和共享的缓存层次结构,对MPI应用程序的性能提出了挑战。通过仔细考虑集体拓扑和硬件拓扑之间的映射以及使用单拷贝内核辅助机制,MPI集体通信的最新进展缓解了深内存层次结构所暴露的性能问题。但是,在分布式环境中,单级方法不仅不能涵盖带宽和等待时间功能的极端变化,还不能包含支持双工通信或操作多个并发副本的能力的极端变化。这要求在多层集合算法之间进行协作,以通过合并节点内和节点间通信来从集合算法中提取最大并行度。在这项工作中,我们介绍了HierKNEM,它是一种内核辅助的拓扑感知集合框架,以及该框架所部署的用于协调多层集合算法之间的协作的机制。所得方案使节点内和节点间通信的重叠最大化。我们通过考虑三个最常用的集体操作(广播,Allgather和Reduction),通过实验证明了:(1)这种方法不受基础过程核心绑定的修改的影响; (2)它的性能优于最新的MPI库(Open MPI,MPICH2和MVAPICH2),对于合成基准而言,它的显示速度最高可提高30倍,对于并行图应用程序(ASP)的显示速度最高可达到3倍; (3)进一步展示了线性加速,随着每个计算节点内核数量的增加,这是未来多核硬件可扩展性的首要要求。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号