Redesigning MPI shared memory communication for large multi-core architecture

Miao Luo; Hao Wang; Jerome Vienne; Dhabaleswar K. (DK) Panda

首页> 外文期刊>Computer science >Redesigning MPI shared memory communication for large multi-core architecture

【24h】

Redesigning MPI shared memory communication for large multi-core architecture

机译：重新设计用于大型多核体系结构的MPI共享内存通信

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Modern multi-core platforms are evolving very rapidly with 32/64 cores for node. Sharing of system resource can increase communication efficiency between processes on the same node. However, it also increases contention for system resource. Currently, most MPI libraries, are developed for systems with relatively small number of cores per node. On the emerging multi-core systems with hundreds of cores per node, existing shared memory mechanisms for MPI run-times will suffer from scalability problem, which may limit the benefits gained from multi-core system. In this paper, we first analyze these problems and then propose a set of new schemes for small message and large message transfer over shared memory. "Shared Tail Cyclic Buffer" scheme is proposed to reduce the number of read and write operations over shared control structures. "State-Driven Polling" scheme is proposed to optimize the message polling through dynamically adjusted polling frequency on different communication pairs. Through dynamic distribution of runtime pinned-down memory, "On-Demand Global Shared Memory Pool" is proposed to bring benefits of pair-wise buffer to large message transfer and optimize shared send buffer utilization without increasing the total shared memory usage. With micro-benchmark evaluation, the new schemes can bring up to 26 % and 70 % improvement for point-to-point latency and bandwidth performance, respectively. For applications, the new schemes can achieve 18 % improvement on the 64-coreode Bulldozer system for Graph500 benchmark, and up to 11 % improvement for NAS benchmarks. With 512 processes evaluation on 32-core Trestles system, the new schemes achieves 16 % improvement for NAS CG benchmark.

机译：随着节点的32/64核，现代多核平台发展非常迅速。系统资源的共享可以提高同一节点上的进程之间的通信效率。但是，这也增加了对系统资源的争用。当前，大多数MPI库是针对每个节点的内核数量相对较少的系统开发的。在每个节点具有数百个内核的新兴多核系统上，用于MPI运行时的现有共享内存机制将遭受可伸缩性问题的困扰，这可能会限制从多核系统获得的收益。在本文中，我们首先分析这些问题，然后针对共享内存中的小消息和大消息传输提出一套新方案。提出“共享尾部循环缓冲器”方案以减少共享控制结构上的读取和写入操作的数量。提出了“状态驱动轮询”方案，以通过动态调整不同通信对上的轮询频率来优化消息轮询。通过动态分配运行时固定内存，提出了“按需全局共享内存池”，以将成对缓冲区的好处带到大型消息传输中，并在不增加共享内存总使用量的情况下优化共享发送缓冲区的利用率。借助微基准评估，新方案分别可将点对点延迟和带宽性能分别提高26％和70％。对于应用程序而言，新方案可在Graph500基准的64核/节点Bulldozer系统上实现18％的改进，在NAS基准上可实现11％的改进。通过在32核Trestles系统上进行512个进程评估，新方案将NAS CG基准提高了16％。

著录项

来源
《Computer science》 |2013年第3期|137-146|共10页
作者
Miao Luo; Hao Wang; Jerome Vienne; Dhabaleswar K. (DK) Panda;
展开▼
作者单位

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA;

Department of Computer Science and Engineering, The Ohio State University, Columbus, OH, USA;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
MPI; Shared memory; Runtime; Multi-core;

机译：MPI;共享内存;运行;多核;

相似文献

外文文献
中文文献
专利

1. Redesigning MPI shared memory communication for large multi-core architecture [J] . Miao Luo, Hao Wang, Jerome Vienne, Computer Science - Research and Development . 2013,第2a3期

机译：重新设计用于大型多核体系结构的MPI共享内存通信
2. EXPLOITING DIRECT ACCESS SHARED MEMORY FOR MPI ON MULTI-CORE PROCESSORS [J] . Ron Brightwell International Journal of High Performance Computing Applications . 2010,第1期

机译：在多核处理器上探索MPI的直接访问共享内存
3. Comparison of MPI Benchmark Programs on Shared Memory and Distributed Memory Machines (Point-to-Point Communication) [J] . Nor Asilah Wati Abdul Hamid, Paul Coddington International Journal of High Performance Computing Applications . 2010,第4期

机译：共享内存和分布式内存机器上的MPI基准程序比较（点对点通信）
4. MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives [C] . Richard L. Graham, Galen Shipman Europen PVM/MPI Users' Group Meeting . 2008

机译：MPI支持多核架构：优化共享内存集集团
5. Optimizing multi-dimensional MPI communications on multi-core architectures. [D] . Karlsson, Christer. 2012

机译：在多核体系结构上优化多维MPI通信。
6. Performance of parallel FDTD method for shared- and distributed-memory architectures: Application tobioelectromagnetics [O] . Miguel Ruiz-Cabello N., Maksims Abaļenkovs, Luis M. Diaz Angulo, 2020

机译：共享和分布式内存架构并行FDTD方法的性能：应用脚踏电磁
7. Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters using Shared Memory Backed Windows [O] . Sreeram Potluri, Hao Wang, Vijay Dhanraj, 2013

机译：使用共享内存支持Windows优化多核InfiniBand群集上的mpI单面通信
8. MPI Support for Multi-Core Architectures: Optimized Shared Memory Collectives. [R] . Graham, R. L., Shipman, G. 2013

机译：mpI支持多核架构：优化共享内存集合。

Redesigning MPI shared memory communication for large multi-core architecture

摘要

著录项

相似文献

相关主题

期刊订阅