首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks
【24h】

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

机译:揭示蜻蜓网络上的全局链路布置与网络管理算法之间的相互作用

获取原文

摘要

Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.
机译:网络消息传递延迟历史上构成了高性能计算(HPC)应用程序的大部分壁钟时间,因为这些应用程序在许多节点上运行并涉及其任务之间的密集通信。蜻蜓网络拓扑由于其低网络直径和大平衡带宽而成为建立Exascale HPC系统的有希望的解决方案。 DragonFly包括通过高带宽光链路形成组和全局链接的本地链接。尚未探索蜻蜓网络设计的许多方面,例如全局链路连接的性能影响,即全局链路布置,本地和全局链路的带宽或作业分配算法。本文首先介绍了一种数据包级仿真框架,以详细介绍HPC应用程序的性能。所提出的框架可以模拟已知的MPI(消息传递接口)例程以及具有定义定义的通信模式的应用程序,用于给定作业放置算法和网络拓扑。使用此仿真框架,我们研究了全局链路带宽和布置,通信模式和强度,作业分配和任务映射算法之间的耦合,以及蜻蜓拓扑中的路由机制。我们证明,通过选择系统设置和工作负载分配算法的正确组合,可以减少最多44 %的通信开销。我们还表明,与其他布置相比,循环安排提供多达15 %的平衡带宽,但对于其他布置,但是对于现实工作负载,链路布置的性能影响小于3 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号