Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures

机译：现代架构共享地址空间MPI集体的设计与特征

获取原文

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Emerging multi-/many-cores such as Intel Xeon and Xeon Phi are widely being adopted for modern large-scale supercomputing systems. The architectural features such as high core density, mesh interconnects, deeper memory hierarchies and hardware multi-threading offered by these systems provide opportunities for application developers to exploit more parallelism. However, it also poses significant challenges for the MPI runtimes to optimize communication performance. One of the major challenges involves optimizing collective communication for dense multi-/many-core processors. Traditionally, MPI runtimes have used send/recv, direct shared-memory ("double-copy") or kernel-assisted ("single-copy") mechanisms for intra-node collective communication. However, existing collective designs that are based on these mechanisms suffer from several bottlenecks such as multiple copies, per message handshake, and kernel-level lock contention that limit their performance. In this paper, we first characterize the bottlenecks associated with the aforementioned approaches in designing collectives in MPI. Then, we propose efficient "Shared-address space"-based designs to implement different MPI collectives. Finally, we show the efficacy of our approach by implementing various MPI collectives. Our proposed designs show up to 11x, 50x, 17x, and 5x performance improvement for Bcast, Scatter, Gather, and Alltoall over other state-of-the-art MPI libraries on different multi-/many-core architectures.

机译：新兴的多/多核如英特尔Xeon和Xeon Phi，广泛采用现代大型超级计算系统采用。这些系统提供的高核心密度，网格互连，更深的内存层次和硬件多线程等架构特征为应用程序开发人员利用更多并行性，提供了机会。但是，它对MPI运行时，它还对优化通信性能构成了重大挑战。其中一个主要挑战涉及优化密集多/多核处理器的集体通信。传统上，MPI运行时使用了发送/ recv，直接共享内存（“双重复制”）或内核辅助（“单拷贝”）机制，用于节点内集体通信。然而，基于这些机制的现有集体设计遭受了几个瓶颈，例如多个副本，每条消息握手以及限制其性能的内核级锁争用。在本文中，我们首先表征与上述在MPI中的集体方法中的前述方法相关的瓶颈。然后，我们提出了高效的“共享地址空间” - 基于设计，以实现不同的MPI集体。最后，我们通过实施各种MPI集体来展示我们的方法的功效。我们提出的设计显示高达11倍，50倍，17倍，和5倍BCAST性能的提高，分散，收集和Alltoall在不同的多/众核架构的其他国家的最先进的MPI库。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2019年|xxxiii 709 p. :|共10页
会议地点
作者
Jahanzeb Maqbool Hashmi; Sourav Chakraborty; Mohammadreza Bayatpour; Hari Subramoni; Dhabaleswar K. Panda;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词
application program interfaces; message passing; microprocessor chips; multiprocessing systems; multi-threading; parallel machines; shared memory systems;

机译：应用程序接口;消息传递;微处理器芯片;多处理系统;多线程;并联机器;共享内存系统;

相似文献

外文文献
中文文献
专利

1. Improved MPI collectives for MPI processes in shared address spaces [J] . Li Shigang, Hoefler Torsten, Hu Chungjin, Cluster computing . 2014,第4期

机译：改进的MPI集合，用于共享地址空间中的MPI流程
2. A Comparison of MPI, SHMEM and Cache - Coherent Shared address Space Programming Models on a Tightly-Coupled Multiprocessors [J] . Hongzhang Shan, Jaswinder Pal Singh International journal of parallel programming . 2001,第3期

机译：MPI，SHMEM和高速缓存的比较-紧密耦合多处理器上的相干共享地址空间编程模型
3. Parallelizing a GIS on a shared address space architecture [J] . Shekhar S., Ravada S. Computer . 1996,第12期

机译：在共享地址空间架构上并行化GIS
4. Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures [C] . Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：现代体系结构上共享地址空间MPI集合的设计和表征
5. High-Performance Communication in MPI through Message Matching and Neighborhood Collective Design [D] . Ghazimirsaeed, Seyedeh Mahdieh 2019

机译：通过消息匹配和邻域集体设计实现MPI中的高性能通信
6. Exploration of a Capability-Focused Aerospace System of Systems Architecture Alternative with Bilayer Design Space Based on RST-SOM Algorithmic Methods [O] . Zhifei Li, Dongliang Qin, Feng Yang -1

机译：基于RST-SOM算法的以双层设计空间为中心的以系统架构替代能力为重点的航空航天系统的探索
7. A Comparison of MPI, SHMEM and Cache-coherent Shared Address Space Programming Models on the SGI Origin2000 [O] . Hongzhang Shan, Jaswinder Pal Singh 1999

机译：SGI Origin2000上的MPI，SHMEM和高速缓存一致性共享地址空间编程模型的比较
8. MPI Support for Multi-Core Architectures: Optimized Shared Memory Collectives. [R] . Graham, R. L., Shipman, G. 2013

机译：mpI支持多核架构：优化共享内存集合。

Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅