Improved MPI collectives for MPI processes in shared address spaces

Li Shigang; Hoefler Torsten; Hu Chungjin; Snir Marc

首页> 外文期刊>Cluster computing >Improved MPI collectives for MPI processes in shared address spaces

【24h】

Improved MPI collectives for MPI processes in shared address spaces

机译：改进的MPI集合，用于共享地址空间中的MPI流程

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the number of cores per node keeps increasing, it becomes increasingly important for MPI to leverage shared memory for intranode communication. This paper investigates the design and optimization of MPI collectives for clusters of NUMA nodes. We develop performance models for collective communication using shared memory and we demonstrate several algorithms for various collectives. Experiments are conducted on both Xeon X5650 and Opteron 6100 InfiniBand clusters. The measurements agree with the model and indicate that different algorithms dominate for short vectors and long vectors. We compare our shared-memory allreduce with several MPI implementations-Open MPI, MPICH2, and MVAPICH2-that utilize system shared memory to facilitate interprocess communication. On a 16-node Xeon cluster and 8-node Opteron cluster, our implementation achieves on geometric average 2.3X and 2.1X speedup over the best MPI implementation, respectively. Our techniques enable an efficient implementation of collective operations on future multi- and manycore systems.

机译：随着每个节点的核心数量不断增加，MPI利用共享内存进行节点内通信变得越来越重要。本文研究了NUMA节点群集的MPI集合的设计和优化。我们使用共享内存开发了用于集体通信的性能模型，并且为各种集体演示了几种算法。实验在Xeon X5650和Opteron 6100 InfiniBand群集上进行。测量结果与模型相符，并表明对于短向量和长向量，不同的算法占主导地位。我们将共享内存的减少与几种MPI实现（开放MPI，MPICH2和MVAPICH2-）进行了比较，这些实现利用系统共享内存来促进进程间通信。在16节点的Xeon群集和8节点的Opteron群集上，我们的实现分别比最佳MPI实现的几何平均速度提高了2.3倍和2.1倍。我们的技术可以在未来的多核和多核系统上有效地实施集体行动。

著录项

来源
《Cluster computing》 |2014年第4期|共17页
作者
Li Shigang; Hoefler Torsten; Hu Chungjin; Snir Marc;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类分子生物学;
关键词
MPI; Multithreading; MPI_Allreduce; Collective communication; NUMA;

机译：MPI;多线程;MPI_Allreduce;集体通信;NUMA;

相似文献

外文文献
中文文献
专利

1. Improved MPI collectives for MPI processes in shared address spaces [J] . Li Shigang, Hoefler Torsten, Hu Chungjin, Cluster computing . 2014,第4期

机译：改进的MPI集合，用于共享地址空间中的MPI流程
2. Improving MPI Collective I/O for High Volume Non-Contiguous Requests With Intra-Node Aggregation [J] . Kang Qiao, Lee Sunwoo, Hou Kaiyuan, IEEE Transactions on Parallel and Distributed Systems . 2020,第11期

机译：使用节点内聚合改进高卷非连续请求的MPI集体I / O.
3. Improving parallel shear-warp volume rendering on shared address space multiprocessors [J] . Dongming Jiang, Jaswinder Pal Singh ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 1997,第7期

机译：在共享地址空间多处理器上改善并行剪切变形体积渲染
4. Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures [C] . Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2019

机译：现代体系结构上共享地址空间MPI集合的设计和表征
5. Improving access to shared data in a partitioned global address space programming model. [D] . Barton, Christopher Mark. 2009

机译：在分区全局地址空间编程模型中改善对共享数据的访问。
6. Phase analysis single-photon emission computed tomography (SPECT) myocardial perfusion imaging (MPI) detects dyssynchrony in myocardial scar and increases specificity of MPI [O] . John P. Bois, Chris Scott, Panithaya Chareonthaitawee, 2019

机译：相分析单光子发射计算机断层扫描（SPECT）心肌灌注成像（MPI）检测心肌疤痕的不同步性并增加MPI的特异性
7. A Comparison of MPI, SHMEM and Cache-coherent Shared Address Space Programming Models on the SGI Origin2000 [O] . Hongzhang Shan, Jaswinder Pal Singh 1999

机译：SGI Origin2000上的MPI，SHMEM和高速缓存一致性共享地址空间编程模型的比较
8. MPI Support for Multi-Core Architectures: Optimized Shared Memory Collectives. [R] . Graham, R. L., Shipman, G. 2013

机译：mpI支持多核架构：优化共享内存集合。

Improved MPI collectives for MPI processes in shared address spaces

摘要

著录项

相似文献

相关主题

期刊订阅