首页> 外文会议>International conference on parallel and distributed computing >Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor
【24h】

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

机译:在Manycore处理器上重叠MPI非阻塞集合体的进度线程的动态放置

获取原文
获取外文期刊封面目录资料

摘要

To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. To address these issues, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented it in the MPC framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.
机译:为了摊销MPI集合操作的成本,已经提出了非阻塞集合,以使通信与计算重叠。不幸的是,集体通信比点对点通信耗费更多的CPU资源,并且在专用CPU内核上的通信线程中运行它们会使速度变慢。另一方面,在应用程序核心上运行集体通信不会导致任何重叠。为了解决这些问题,我们提出了一种用于基于树的集体操作的算法,该算法在通信核心和应用核心之间划分树。为了获得最好的效果,该算法在应用程序核心上运行树的短而沉重的部分,而在一个或多个通信核心上运行树的长而窄的部分,以便在重叠和重叠之间进行权衡。绝对的表现。我们提供了一个模型来研究和预测其行为并调整其参数。我们在MPC框架中实现了它,它是基于线程的MPI实现。我们已经在许多核心处理器(例如KNL和Skylake)上运行了基准测试,并且在性能和重叠方面都取得了不错的成绩。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号