首页> 外文期刊>International journal of parallel programming >Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations
【24h】

Maximizing Communication-Computation Overlap Through Automatic Parallelization and Run-time Tuning of Non-blocking Collective Operations

机译:通过自动并行化和无阻塞集体操作的运行时调整来最大化通信计算重叠

获取原文
获取原文并翻译 | 示例

摘要

Non-blocking collective communication operations extend the concept of collective operations by offering the additional benefit of being able to overlap communication and computation. They are often considered key building blocks for scaling applications to very large process counts. Yet, using non-blocking collective operations in real-world applications is non-trivial. Application codes often have to be restructured significantly in order to maximize the communication-computation overlap. This paper presents an approach to maximize the communication-computation overlap for hybrid OpenMP/MPI applications. The work leverages automatic parallelization by extending the ability of an existing tool to utilize non-blocking collective operations. It further integrates run-time auto-tuning techniques of non-blocking collective operations, optimizing both, the algorithms used for the non-blocking collective operations as well as location and frequency of accompanying progress function calls. Four application benchmarks were used to demonstrate the efficiency and versatility of the approach on two different platforms. The results indicate significant performance improvements in virtually all test scenarios. The resulting parallel applications achieved a performance improvement of up to 43% compared to the version using blocking communication operations, and up to 95% of the maximum theoretical communication-computation overlap identified for each scenario.
机译:无阻塞集体通信操作通过提供能够重叠通信和计算的额外好处,扩展了集体操作的概念。它们通常被认为是将应用程序扩展到非常大的过程数量的关键构建块。但是,在现实世界的应用程序中使用非阻塞集体操作并非易事。为了最大程度地提高通信计算的重叠度,通常必须对应用程序代码进行重大重组。本文提出了一种用于最大化OpenMP / MPI混合应用的通信计算重叠的方法。该工作通过扩展现有工具利用非阻塞集体操作的能力来利用自动并行化。它还集成了非阻塞集体操作的运行时自动调整技术,从而优化了用于非阻塞集体操作的算法以及伴随的进度函数调用的位置和频率。四个应用程序基准被用来证明该方法在两个不同平台上的效率和多功能性。结果表明,几乎在所有测试方案中,性能都得到了显着改善。与使用阻塞通信操作的版本相比,所产生的并行应用程序的性能提高了43%,每种情况下确定的最大理论通信计算重叠量也达到了95%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号