Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

机译：在Manycore处理器上重叠MPI非阻塞集合体的进度线程的动态放置

获取原文

获取外文期刊封面目录资料

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

To amortize the cost of MPI collective operations, non-blocking collectives have been proposed so as to allow communications to be overlapped with computation. Unfortunately, collective communications are more CPU-hungry than point-to-point communications and running them in a communication thread on a dedicated CPU core makes them slow. On the other hand, running collective communications on the application cores leads to no overlap. To address these issues, we propose an algorithm for tree-based collective operations that splits the tree between communication cores and application cores. To get the best of both worlds, the algorithm runs the short but heavy part of the tree on application cores, and the long but narrow part of the tree on one or several communication cores, so as to get a trade-off between overlap and absolute performance. We provide a model to study and predict its behavior and to tune its parameters. We implemented it in the MPC framework, which is a thread-based MPI implementation. We have run benchmarks on manycore processors such as the KNL and Skylake and get good results for both performance and overlap.

机译：为了摊销MPI集合操作的成本，已经提出了非阻塞集合，以使通信与计算重叠。不幸的是，集体通信比点对点通信耗费更多的CPU资源，并且在专用CPU内核上的通信线程中运行它们会使速度变慢。另一方面，在应用程序核心上运行集体通信不会导致任何重叠。为了解决这些问题，我们提出了一种用于基于树的集体操作的算法，该算法在通信核心和应用核心之间划分树。为了获得最好的效果，该算法在应用程序核心上运行树的短而沉重的部分，而在一个或多个通信核心上运行树的长而窄的部分，以便在重叠和重叠之间进行权衡。绝对的表现。我们提供了一个模型来研究和预测其行为并调整其参数。我们在MPC框架中实现了它，它是基于线程的MPI实现。我们已经在许多核心处理器（例如KNL和Skylake）上运行了基准测试，并且在性能和重叠方面都取得了不错的成绩。

著录项

来源
《International conference on parallel and distributed computing》|2018年|616-627|共12页
会议地点
作者
Alexandre Denis; Julien Jaeger; Emmanuel Jeannot; Marc Perache; Hugo Taboada;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Study on progress threads placement and dedicated cores for overlapping MPI nonblocking collectives on manycore processor [J] . Denis Alexandre, Jaeger Julien, Jeannot Emmanuel, Experimental Mechanics . 2019,第6期

机译：研究多核处理器上重叠的MPI无阻塞集合的进度线程放置和专用内核
2. Static/Dynamic Validation of MPI Collective Communications in Multi-threaded Context [J] . Saillard Emmanuelle, Carribault Patrick, Barthou Denis ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2015,第8期

机译：多线程上下文中MPI集合通信的静态/动态验证
3. High Performance and Enhanced Scalability for Parallel Applications using MPI-3’s non-blocking Collectives [J] . Surendra Varma Pericherla, Sathish Vadhiyar Procedia Computer Science . 2017,第1期

机译：使用MPI-3的非阻塞集合体为并行应用程序提供高性能和增强的可伸缩性
4. Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor [C] . Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, International Conference on Parallel and Distributed Computing . 2018

机译：在多核处理器上重叠MPI非阻集集团的进步线程动态放置
5. Fastener dynamics: Optimum placement, effect of thread dimensional conformance, and threadlocker life. [D] . Dong, Yubo. 1998

机译：紧固件动力学：最佳放置，螺纹尺寸一致性的影响以及螺纹锁固剂的寿命。
6. From point process observations to collective neural dynamics: Nonlinear Hawkes process GLMs low-dimensional dynamics and coarse graining [O] . Wilson Truccolo -1

机译：从点过程观察到集体神经动力学：非线性霍克斯过程GLM低维动力学和粗粒度
7. Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor [O] . Alexandre Denis, Julien Jaeger, Emmanuel Jeannot, 2018

机译：在多核处理器上重叠MPI非阻集集团的进步线程动态放置

Dynamic Placement of Progress Thread for Overlapping MPI Non-blocking Collectives on Manycore Processor

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅