Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

Denis Alexandre; Jaeger Julien; Jeannot Emmanuel; Perache Marc; Taboada Hugo

首页> 外文期刊>Experimental Mechanics >Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

【24h】

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

机译：研究多核处理器上重叠的MPI无阻塞集合的进度线程放置和专用内核

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

To amortize the cost of MPI collective operations, nonblocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. In this article, we propose placement algorithms for progress threads that do not degrade performance when running on cores dedicated to communications to get communication/computation overlap. We first show that even simple collective operations, such as those based on a chain topology, are not straightforward to make progress in background on a dedicated core. Then, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented both algorithms in the multiprocessor computing framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.

机译：为了摊销MPI集合操作的成本，已经提出了非阻塞集合，以使通信与计算重叠。不幸的是，集体通信比点对点通信更耗费CPU资源，并且在专用CPU内核上的通信线程中运行它们会使速度变慢。另一方面，在应用程序核心上运行集体通信不会导致重叠。在本文中，我们提出了用于进度线程的放置算法，该算法在专用于通信的内核上运行时不会降低性能，以使通信/计算重叠。我们首先表明，即使是简单的集体操作（例如基于链拓扑的操作）也不容易直接在专用内核的后台取得进展。然后，我们提出了一种用于基于树的集体操作的算法，该算法在通信核心和应用核心之间划分树。为了获得最好的效果，该算法在应用程序核心上运行树的短而沉重的部分，而在一个或多个通信核心上运行树的长而窄的部分，以便在重叠和绝对表现。我们提供了一个模型来研究和预测其行为并调整其参数。我们在多处理器计算框架中实现了这两种算法，这是基于线程的MPI实现。我们已经在许多核心处理器（例如KNL和Skylake）上运行了基准测试，并且在性能和重叠方面都取得了不错的成绩。

著录项

来源
《Experimental Mechanics》 |2019年第6期|1240-1254|共15页
作者
Denis Alexandre; Jaeger Julien; Jeannot Emmanuel; Perache Marc; Taboada Hugo;
展开▼
作者单位

Univ Bordeaux Bordeaux INP CNRS Inria LaBRI Bordeaux France;

CEA DIF DAM F-91297 Arpajon France;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Nonblocking collectives; MPI; placement; communication; computation overlap;

机译：畅通无阻的集体;MPI;放置;通讯;计算重叠;

相似文献

外文文献
中文文献
专利

1. FluidCheck: A Redundant Threading-Based Approach for Reliable Execution in Manycore Processors [J] . Kalayappan Rajshekar, Sarangi Smruti R. ACM Transactions on Architecture and Code Optimization . 2015,第4期

机译：FluidCheck：Manycore处理器中基于线程的可靠执行方法
2. A scalability prediction approach for multi-threaded applications on manycore processors [J] . Bai Xiuxiu, Wang Endong, Dong Xiaoshe, Journal of supercomputing . 2015,第11期

机译：用于多核处理器上多线程应用程序的可伸缩性预测方法
3. Parallel programming model for the Epiphany many-core coprocessor using threaded MPI [J] . J. Arul Computing reviews . 2016,第11期

机译：使用线程MPI的Epiphany多核协处理器的并行编程模型
4. Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor [C] . Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, International conference on parallel and distributed computing . 2018

机译：在Manycore处理器上重叠MPI非阻塞集合体的进度线程的动态放置
5. Efficient throughput cores for asymmetric manycore processors. [D] . Tarjan, David. 2009

机译：非对称多核处理器的高效吞吐量核心。
6. Influence of Processing Parameters on the Thread and Spline Synchronous Rolling Process: An Experimental Study [O] . Da-Wei Zhang, Bing-Kun Liu, Sheng-Dun Zhao 2019

机译：工艺参数对花键同步轧制过程的影响：实验研究
7. Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor [O] . Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, 2019

机译：在多核处理器上重叠MPI非阻塞集团的进展线程放置和专用核的研究

Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor

摘要

著录项

相似文献

相关主题

期刊订阅