首页> 外文会议>IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing >Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks
【24h】

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

机译:在蜻蜓网络上揭示全局链接安排和网络管理算法之间的相互作用

获取原文

摘要

Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.
机译:过去,网络消息传递延迟在高性能计算(HPC)应用程序中占壁钟时间的很大一部分,因为这些应用程序运行在许多节点上,并且涉及其任务之间的密集通信。蜻蜓网络拓扑由于其较小的网络直径和较大的对等带宽而已成为构建百亿分之一的HPC系统的有前途的解决方案。蜻蜓包括形成组的本地链接和通过高带宽光学链接将这些组连接起来的全局链接。蜻蜓网络设计的许多方面尚待探索,例如全局链接的连通性对性能的影响,即全局链接安排,本地和全局链接的带宽或作业分配算法。本文首先介绍了一个数据包级仿真框架,以详细建模HPC应用程序的性能。对于给定的工作分配算法和网络拓扑,提出的框架能够模拟已知的MPI(消息传递接口)例程以及具有自定义通信模式的应用程序。使用此仿真框架,我们研究了蜻蜓拓扑中全局链路带宽和安排,通信模式和强度,作业分配和任务映射算法以及路由机制之间的耦合。我们证明,通过选择正确的系统设置和工作负载分配算法组合,可以将通信开销减少多达44%。我们还显示,与其他布置相比,循环布置可提供高达15%的对等带宽,但是对于实际的工作负载,链路布置的性能影响小于3%。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号