首页> 外文会议>2012 SC Companion: High Performance Computing, Networking, Storage and Analysis. >Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications
【24h】

Abstract: Slack-Conscious Lightweight Loop Scheduling for Improving Scalability of Bulk-synchronous MPI Applications

机译:摘要:用于提高批量同步MPI应用程序可伸缩性的松弛意识轻量级循环调度

获取原文
获取原文并翻译 | 示例

摘要

Due to the strict communication dependences in the global collective communication of MPI applications, noise that delays one process can amplify across processes in a large run. The amount of overhead that noise amplification causes can increase dramatically as we scale the application to a very large numbers of processes (10,000 or more). For hybrid OpenMP/MPI (or MPI+X) applications, we can reduce noise amplification with on- node dynamic thread scheduling. However, the cost of dequeue overhead in such schemes can be steep. To mitigate this cost, we have introduced lightweight scheduling, which combines dynamic and static task scheduling to reduce the total number of dequeue operations while still absorbing noise. Our scheme allows for portability and performance consistency, without reducing the absolute performance of the application. In this work, we reduce the overhead of our scheme further by carefully using more static scheduling when we know that noise will not be amplified. We exploit a priori knowledge of per-process MPI slack to reduce the static fraction for those MPI processes that are known not to be on the critical path and thus likely not to amplify noise. We find that this technique gives an 11% performance gain over the original lightweight scheduling (17% gain over OpenMP static scheduling) when we run an algebraic multi-grid application on up to 16,384 process runs (1024 nodes) of a NUMA cluster, and are able to project further performance gains on machines with node counts beyond 10,000.
机译:由于MPI应用程序的全局集体通信中严格的通信依赖关系,因此延迟一个进程的噪声会在较大的范围内放大整个进程。随着我们将应用程序扩展到非常多的进程(10,000或更多),噪声放大导致的开销可能会急剧增加。对于混合的OpenMP / MPI(或MPI + X)应用程序,我们可以通过节点上的动态线程调度来减少噪声放大。但是,此类方案中出队开销的成本可能会很高。为了减轻此成本,我们引入了轻量级调度,该方案将动态和静态任务调度相结合,以减少出队操作的总数,同时仍吸收噪声。我们的方案允许可移植性和性能一致性,而不会降低应用程序的绝对性能。在这项工作中,当我们知道噪声不会被放大时,我们通过谨慎地使用更多的静态调度来进一步减少方案的开销。我们利用每进程MPI松弛的先验知识来减少那些已知不在关键路径上的MPI进程的静态分数,从而可能不会放大噪声。我们发现,当我们在NUMA集群的多达16,384个进程运行(1024个节点)上运行代数多网格应用程序时,与原始轻量级调度相比,此技术的性能提高了11%(与OpenMP静态调度相比的性能提高了17%),并且能够在节点数超过10,000的计算机上预测出进一步的性能提升。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号