首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures
【24h】

Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures

机译:现代架构共享地址空间MPI集体的设计与特征

获取原文

摘要

Emerging multi-/many-cores such as Intel Xeon and Xeon Phi are widely being adopted for modern large-scale supercomputing systems. The architectural features such as high core density, mesh interconnects, deeper memory hierarchies and hardware multi-threading offered by these systems provide opportunities for application developers to exploit more parallelism. However, it also poses significant challenges for the MPI runtimes to optimize communication performance. One of the major challenges involves optimizing collective communication for dense multi-/many-core processors. Traditionally, MPI runtimes have used send/recv, direct shared-memory ("double-copy") or kernel-assisted ("single-copy") mechanisms for intra-node collective communication. However, existing collective designs that are based on these mechanisms suffer from several bottlenecks such as multiple copies, per message handshake, and kernel-level lock contention that limit their performance. In this paper, we first characterize the bottlenecks associated with the aforementioned approaches in designing collectives in MPI. Then, we propose efficient "Shared-address space"-based designs to implement different MPI collectives. Finally, we show the efficacy of our approach by implementing various MPI collectives. Our proposed designs show up to 11x, 50x, 17x, and 5x performance improvement for Bcast, Scatter, Gather, and Alltoall over other state-of-the-art MPI libraries on different multi-/many-core architectures.
机译:新兴的多/多核如英特尔Xeon和Xeon Phi,广泛采用现代大型超级计算系统采用。这些系统提供的高核心密度,网格互连,更深的内存层次和硬件多线程等架构特征为应用程序开发人员利用更多并行性,提供了机会。但是,它对MPI运行时,它还对优化通信性能构成了重大挑战。其中一个主要挑战涉及优化密集多/多核处理器的集体通信。传统上,MPI运行时使用了发送/ recv,直接共享内存(“双重复制”)或内核辅助(“单拷贝”)机制,用于节点内集体通信。然而,基于这些机制的现有集体设计遭受了几个瓶颈,例如多个副本,每条消息握手以及限制其性能的内核级锁争用。在本文中,我们首先表征与上述在MPI中的集体方法中的前述方法相关的瓶颈。然后,我们提出了高效的“共享地址空间” - 基于设计,以实现不同的MPI集体。最后,我们通过实施各种MPI集体来展示我们的方法的功效。我们提出的设计显示高达11倍,50倍,17倍,和5倍BCAST性能的提高,分散,收集和Alltoall在不同的多/众核架构的其他国家的最先进的MPI库。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号