Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

Barigou Youcef; Gabriel Edgar

首页> 外文期刊>International journal of parallel programming >Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

【24h】

Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

机译：通过自动并行化和无阻塞集体操作的运行时调整来最大化通信计算重叠

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Non-blocking collective communication operations extend the concept of collective operations by offering the additional benefit of being able to overlap communication and computation. They are often considered key building blocks for scaling applications to very large process counts. Yet, using non-blocking collective operations in real-world applications is non-trivial. Application codes often have to be restructured significantly in order to maximize the communication-computation overlap. This paper presents an approach to maximize the communication-computation overlap for hybrid OpenMP/MPI applications. The work leverages automatic parallelization by extending the ability of an existing tool to utilize non-blocking collective operations. It further integrates run-time auto-tuning techniques of non-blocking collective operations, optimizing both, the algorithms used for the non-blocking collective operations as well as location and frequency of accompanying progress function calls. Four application benchmarks were used to demonstrate the efficiency and versatility of the approach on two different platforms. The results indicate significant performance improvements in virtually all test scenarios. The resulting parallel applications achieved a performance improvement of up to 43% compared to the version using blocking communication operations, and up to 95% of the maximum theoretical communication-computation overlap identified for each scenario.

机译：无阻塞集体通信操作通过提供能够重叠通信和计算的额外好处，扩展了集体操作的概念。它们通常被认为是将应用程序扩展到非常大的过程数量的关键构建块。但是，在现实世界的应用程序中使用非阻塞集体操作并非易事。为了最大程度地提高通信计算的重叠度，通常必须对应用程序代码进行重大重组。本文提出了一种用于最大化OpenMP / MPI混合应用的通信计算重叠的方法。该工作通过扩展现有工具利用非阻塞集体操作的能力来利用自动并行化。它还集成了非阻塞集体操作的运行时自动调整技术，从而优化了用于非阻塞集体操作的算法以及伴随的进度函数调用的位置和频率。四个应用程序基准被用来证明该方法在两个不同平台上的效率和多功能性。结果表明，几乎在所有测试方案中，性能都得到了显着改善。与使用阻塞通信操作的版本相比，所产生的并行应用程序的性能提高了43％，每种情况下确定的最大理论通信计算重叠量也达到了95％。

著录项

来源
《International journal of parallel programming》 |2017年第6期|1390-1416|共27页
作者
Barigou Youcef; Gabriel Edgar;
展开▼
作者单位

Department of Computer Science, University of Houston, Houston, TX, United States;

Department of Computer Science, University of Houston, Houston, TX, United States;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Auto-tuning; Communication-computation overlap; MPI; Non-blocking collective operations; OpenMP;

机译：自动调节;通信-计算重叠;MPI;无阻碍的集体行动;OpenMP的;

相似文献

外文文献
中文文献
专利

1. Run-Time Support for the Automatic Parallelization of Java Programs [J] . BRYAN CHAN, TAREK S. ABDELRAHMAN Journal of supercomputing . 2004,第1期

机译：Java程序自动并行化的运行时支持
2. High Performance and Enhanced Scalability for Parallel Applications using MPI-3’s non-blocking Collectives [J] . Surendra Varma Pericherla, Sathish Vadhiyar Procedia Computer Science . 2017,第1期

机译：使用MPI-3的非阻塞集合体为并行应用程序提供高性能和增强的可伸缩性
3. High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT [J] . Krishna Kandalla, Hari Subramoni, Karen Tomko, Computer science . 2011,第3a4期

机译：在InfiniBand群集上具有集体卸载的高性能和可扩展的无阻塞全部到全部：使用并行3D FFT的研究
4. Auto-tuning Non-blocking Collective Communication Operations [C] . Barigou Youcef, Venkatesan Vishwanath, Gabriel Edgar 2015 IEEE 29th International Parallel and Distributed Processing Symposium Workshops . 2015

机译：自动调整无阻塞集体通信操作
5. Run-time support for the automatic parallelization of Java programs. [D] . Chan, Bryan Pak Kwai. 2002

机译：Java程序自动并行化的运行时支持。
6. Neural Circuits: Carotid chemoreceptors tune breathing via multipath routing: reticular chain and loop operations supported by parallel spike train correlations [O] . Kendall F. Morris, Sarah C. Nuding, Lauren S. Segers, -1

机译：神经回路：颈动脉化学感受器通过多路径路由调节呼吸：平行峰序列相关性支持的网状链和环操作
7. A simulation framework to automatically analyze the communication-computation overlap in scientific applications [O] . Subotic, Vladimir, Sancho, Jose Carlos, Labarta Mancho, Jesús José, 2010

机译：一种模拟框架，用于自动分析科学应用中的通信计算重叠
8. ATCOM: Automatically Tuned Collective Communication System for SMP Clusters Doctoral thesis [R] . Wu, M. S. 2005

机译：aTCOm：smp集群自动调整集体通信系统博士论文

Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅