MPI+ULT: Overlapping Communication and Computation with User-Level Threads

机译：MPI + ULT：与用户级线程重叠的通信和计算

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

As the core density of future processors keeps increasing, MPI+Threads is becoming a promising programming model for large scale SMP clusters. Generally speaking, hybrid MPI+Threads runtime can largely improve intra-node parallelism and data sharing on shared-memory architectures. However, it does not help much on inter-node communication due to the inefficient integration of existing communication and threading libraries. More specifically, existing MPI+Threads runtime systems use coarse-grained locks to protect their thread safety, which leads to heavy lock contention and limit the scalability of the runtime. While kernel threads are efficient for intra-node parallelism, we found that they are too heavy for computation/communication overlap in an MPI+Threads runtime system. In this paper we propose a new way for asynchronous MPI communication with user-level threads (MPI+ULT). By enabling ULT context switching inside MPI, MPI communication in one ULT can overlap with computation or communication in other ULTs. MPI+ULT can be used for communication hiding in various scenarios, including MPI point-to-point, collective and one-sided calls. We use MPI+ULT in two applications, a high-performance conjugate gradient benchmark and a genome assembly application, to show how MPI+ULT can help effectively hide communication and reduce runtime overhead. Experiments show that our method helps improve the performance of these applications significantly.

机译：随着未来处理器的核心密度不断提高，MPI + Threads正成为用于大型SMP集群的有前途的编程模型。一般来说，MPI + Threads混合运行时可以在共享内存体系结构上极大地改善节点内并行性和数据共享。但是，由于现有通信和线程库的集成效率低下，它对节点间通信没有太大帮助。更具体地说，现有的MPI + Threads运行时系统使用粗粒度锁来保护其线程安全，这导致大量锁争用并限制了运行时的可伸缩性。尽管内核线程对于节点内并行性非常有效，但我们发现它们对于MPI + Threads运行时系统中的计算/通信重叠而言过于繁重。在本文中，我们提出了一种与用户级线程（MPI + ULT）进行异步MPI通信的新方法。通过在MPI内部启用ULT上下文切换，一个ULT中的MPI通信可以与其他ULT中的计算或通信重叠。 MPI + ULT可用于各种情况下的通信隐藏，包括MPI点对点，集体呼叫和单方呼叫。我们在两个应用程序（高性能共轭梯度基准测试和基因组装配应用程序）中使用MPI + ULT，以显示MPI + ULT如何帮助有效隐藏通信并减少运行时开销。实验表明，我们的方法有助于显着提高这些应用程序的性能。

著录项

来源
《2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, 2015 IEEE 12th International Conference on Embedded Software and Systems 》|2015年|444-454|共11页
会议地点 New York NY(US)
作者
Huiwei Lu; Sangmin Seo; Balaji Pavan;
展开▼
作者单位

Math. Comput. Sci. Div., Argonne Nat. Lab., Argonne, IL, USA;

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类
关键词
application program interfaces; memory architecture; parallel architectures; shared memory systems; MPI-and-ULT; asynchronous MPI communication-with-user-level thread; data sharing; high-performance conjugate gradient benchmark; hybrid MPI-and-Thread runtime system; inter-node communication; intra-node parallelism; large scale SMP cluster; shared-memory architecture; threading libraries; Computational modeling; Context; Instruction sets; Kernel; Message systems; Runtime; Switches; MPI+X; Message Passing Interface; Ove;

机译：应用程序接口；内存体系结构；并行体系结构；共享内存系统； MPI和ULT；与用户级线程的异步MPI通信；数据共享；高性能共轭梯度基准；混合MPI和线程运行时系统；节点间通信;节点内并行性;大规模SMP集群;共享内存体系结构;线程库;计算建模;上下文;指令集;内核;消息系统;运行时;开关; MPI + X;消息传递接口; Ove;

相似文献

外文文献
中文文献
专利

1. Overlapping Communication With Computation in Parameter Server for Scalable DL Training [J] . Wang Shaoqi, Pi Aidi, Zhou Xiaobo, IEEE Transactions on Parallel and Distributed Systems . 2021 ,第9期

机译：与可扩展DL训练的参数服务器中的计算重叠通信
2. Overlapping communication and computation of GPU/CPU heterogeneous parallel spatial domain decomposition MOC method [J] . Liang Liang, Zhang Qian, Song Peitao, Annals of nuclear energy . 2020 ,第Jana期

机译：GPU / CPU异构并行空间域分解MOC方法的重叠通信与计算
3. Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations [J] . Barigou Youcef, Gabriel Edgar International journal of parallel programming . 2017 ,第6期

机译：通过自动并行化和无阻塞集体操作的运行时调整来最大化通信计算重叠
4. MPI+ULT: Overlapping Communication and Computation with User-Level Threads [C] . Huiwei Lu, Sangmin Seo, Balaji Pavan IEEE International Conference on High Performance Computing and Communications . 2015

机译：MPI + ULT：使用用户级线程重叠通信和计算
5. Efficient Parallel All-Pairs Computation Framework: Using Computation - Communication Overlap [D] . Yeleswarapu, Venkata Kasi Viswanath. 2017

机译：高效并行全对计算框架：使用计算 - 通信重叠
6. Retraction notice to Overlapping signal sequences controlnuclear localization and endoplasmic reticulum retention of GRP58Biochemical and Biophysical Research Communications 377 (2) (2008)407–412 [O] . Anbu Karani Adikesavan, Emmanual Unni, Anil K. Jaiswal -1

机译：撤回重叠信号序列控制通知GRP58的核定位和内质网保留生化与生物物理研究通讯377（2）（2008）407–412
7. Effective parallel computation on workstation cluster with a user-level communication network [O] . Hoe James C. (James Chu-Yue) 1994

机译：使用用户级通信网络在工作站群集上进行有效的并行计算

MPI+ULT: Overlapping Communication and Computation with User-Level Threads

摘要

著录项

相似文献

相关主题

期刊订阅