首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures
【24h】

Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures

机译:现代体系结构上共享地址空间MPI集合的设计和表征

获取原文

摘要

Emerging multi-/many-cores such as Intel Xeon and Xeon Phi are widely being adopted for modern large-scale supercomputing systems. The architectural features such as high core density, mesh interconnects, deeper memory hierarchies and hardware multi-threading offered by these systems provide opportunities for application developers to exploit more parallelism. However, it also poses significant challenges for the MPI runtimes to optimize communication performance. One of the major challenges involves optimizing collective communication for dense multi-/many-core processors. Traditionally, MPI runtimes have used send/recv, direct shared-memory ("double-copy") or kernel-assisted ("single-copy") mechanisms for intra-node collective communication. However, existing collective designs that are based on these mechanisms suffer from several bottlenecks such as multiple copies, per message handshake, and kernel-level lock contention that limit their performance. In this paper, we first characterize the bottlenecks associated with the aforementioned approaches in designing collectives in MPI. Then, we propose efficient "Shared-address space"-based designs to implement different MPI collectives. Finally, we show the efficacy of our approach by implementing various MPI collectives. Our proposed designs show up to 11x, 50x, 17x, and 5x performance improvement for Bcast, Scatter, Gather, and Alltoall over other state-of-the-art MPI libraries on different multi-/many-core architectures.
机译:诸如英特尔至强和至强融核等新兴的多核/多核被现代大规模超级计算系统广泛采用。这些系统提供的架构功能(例如,高内核密度,网格互连,更深的内存层次结构和硬件多线程)为应用程序开发人员提供了更多利用并行性的机会。但是,这也给MPI运行时优化通信性能提出了严峻的挑战。主要挑战之一涉及为密集的多核/多核处理器优化集体通信。传统上,MPI运行时使用发送/接收,直接共享内存(“双副本”)或内核辅助(“单副本”)机制进行节点内集体通信。但是,基于这些机制的现有集合设计存在多个瓶颈,例如多个副本,每个消息握手以及限制它们性能的内核级锁争用。在本文中,我们首先描述了在MPI中设计集合体时与上述方法相关的瓶颈。然后,我们提出了有效的基于“共享地址空间”的设计,以实现不同的MPI集合。最后,我们通过实施各种MPI集合展示了我们方法的有效性。我们建议的设计相对于不同的多核/多核体系结构上的其他最新MPI库,对Bcast,Scatter,Gather和Alltoall的性能提高了11倍,50倍,17倍和5倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号