Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications

机译：摘要：用于提高批量同步MPI应用程序可伸缩性的松弛意识轻量级循环调度

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Due to the strict communication dependences in the global collective communication of MPI applications, noise that delays one process can amplify across processes in a large run. The amount of overhead that noise amplification causes can increase dramatically as we scale the application to a very large numbers of processes (10,000 or more). For hybrid OpenMP/MPI (or MPI+X) applications, we can reduce noise amplification with on- node dynamic thread scheduling. However, the cost of dequeue overhead in such schemes can be steep. To mitigate this cost, we have introduced lightweight scheduling, which combines dynamic and static task scheduling to reduce the total number of dequeue operations while still absorbing noise. Our scheme allows for portability and performance consistency, without reducing the absolute performance of the application. In this work, we reduce the overhead of our scheme further by carefully using more static scheduling when we know that noise will not be amplified. We exploit a priori knowledge of per-process MPI slack to reduce the static fraction for those MPI processes that are known not to be on the critical path and thus likely not to amplify noise. We find that this technique gives an 11% performance gain over the original lightweight scheduling (17% gain over OpenMP static scheduling) when we run an algebraic multi-grid application on up to 16,384 process runs (1024 nodes) of a NUMA cluster, and are able to project further performance gains on machines with node counts beyond 10,000.

机译：由于MPI应用程序的全局集体通信中严格的通信依赖关系，因此延迟一个进程的噪声会在较大的范围内放大整个进程。随着我们将应用程序扩展到非常多的进程（10,000或更多），噪声放大导致的开销可能会急剧增加。对于混合的OpenMP / MPI（或MPI + X）应用程序，我们可以通过节点上的动态线程调度来减少噪声放大。但是，此类方案中出队开销的成本可能会很高。为了减轻此成本，我们引入了轻量级调度，该方案将动态和静态任务调度相结合，以减少出队操作的总数，同时仍吸收噪声。我们的方案允许可移植性和性能一致性，而不会降低应用程序的绝对性能。在这项工作中，当我们知道噪声不会被放大时，我们通过谨慎地使用更多的静态调度来进一步减少方案的开销。我们利用每进程MPI松弛的先验知识来减少那些已知不在关键路径上的MPI进程的静态分数，从而可能不会放大噪声。我们发现，当我们在NUMA集群的多达16,384个进程运行（1024个节点）上运行代数多网格应用程序时，与原始轻量级调度相比，此技术的性能提高了11％（与OpenMP静态调度相比的性能提高了17％），并且能够在节点数超过10,000的计算机上预测出进一步的性能提升。

著录项

来源
《2012 SC Companion: High Performance Computing, Networking, Storage and Analysis.》|2012年|p.1392|共1页
会议地点 Salt Lake City UT(US)
作者
Kale Vivek; Gamblin Todd; Hoefler Torsten; de Supinski Bronis R.; Gropp William D.;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种 eng
中图分类计算机网络;计算机网络;
关键词
Dynamic Scheduling; MPI; Noise Amplification; OpenMP; Performance Optimization; Slack; System Noise;

机译：动态调度; MPI;噪声放大; OpenMP;性能优化;松弛;系统噪声;;

相似文献

外文文献
中文文献
专利

1. EREINIT: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications [J] . Chakraborty Sourav, Laguna Ignacio, Emani Murali, Concurrency, practice and experience . 2020,第3期

机译：EREINIT：适用于批量同步MPI应用程序的可扩展且高效的容错能力
2. MigPF: Towards on self-organizing process rescheduling of Bulk-Synchronous Parallel applications [J] . Rodrigo da Rosa Righi, Roberto de Quadros Gomes, Vinicius Facco Rodrigues, Future generation computer systems . 2018,第pta1期

机译：MigPF：大批量同步并行应用程序的自组织过程重新计划
3. Designing parallel loop self-scheduling schemes using the hybrid MPI and OpenMP programming model for multi-core grid systems [J] . Chao-Chin Wu, Chao-Tung Yang, Kuan-Chou Lai, Journal of supercomputing . 2012,第1期

机译：使用混合MPI和OpenMP编程模型为多核网格系统设计并行循环自调度方案
4. Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications [C] . Kale Vivek, Gamblin Todd, Hoefler Torsten, International Conference for High Performance Computing, Networking, Storage and Analysis . 2012

机译：摘要：懈怠的轻质循环调度，用于提高批量同步MPI应用的可扩展性
5. Software/hardware co-design to improve productivity, portability, and performance of loop-task parallel applications [D] . Kim, Ji Yun. 2017

机译：软件/硬件协同设计可提高循环任务并行应用程序的生产率，可移植性和性能
6. Improving on Half-Lightweight Male Judokas High Performance by the Application of the Analytic Network Process [O] . Sugoi Uriarte Marcos, Raúl Rodríguez-Rodríguez, Juan-José Alfaro-Saiz, 2021

机译：通过应用分析网络过程改善半轻量级雄性judokas的高性能
7. Hierarchical Dynamic Loop Self-Scheduling on Distributed-Memory Systems Using an MPI+MPI Approach [O] . Ahmed Eleliemy, Florina M. Ciorba 2019

机译：使用MPI + MPI方法在分布式存储系统上自行调度的分层动态循环

Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅