Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

机译：在蜻蜓网络上揭示全局链接安排和网络管理算法之间的相互作用

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

Network messaging delay historically constitutes a large portion of the wall-clock time for High Performance Computing (HPC) applications, as these applications run on many nodes and involve intensive communication among their tasks. Dragonfly network topology has emerged as a promising solution for building exascale HPC systems owing to its low network diameter and large bisection bandwidth. Dragonfly includes local links that form groups and global links that connect these groups via high bandwidth optical links. Many aspects of the dragonfly network design are yet to be explored, such as the performance impact of the connectivity of the global links, i.e., global link arrangements, the bandwidth of the local and global links, or the job allocation algorithm. This paper first introduces a packet-level simulation framework to model the performance of HPC applications in detail. The proposed framework is able to simulate known MPI (message passing interface) routines as well as applications with custom-defined communication patterns for a given job placement algorithm and network topology. Using this simulation framework, we investigate the coupling between global link bandwidth and arrangements, communication pattern and intensity, job allocation and task mapping algorithms, and routing mechanisms in dragonfly topologies. We demonstrate that by choosing the right combination of system settings and workload allocation algorithms, communication overhead can be decreased by up to 44%. We also show that circulant arrangement provides up to 15% higher bisection bandwidth compared to the other arrangements, but for realistic workloads, the performance impact of link arrangements is less than 3%.

机译：过去，网络消息传递延迟在高性能计算（HPC）应用程序中占壁钟时间的很大一部分，因为这些应用程序运行在许多节点上，并且涉及其任务之间的密集通信。蜻蜓网络拓扑由于其较小的网络直径和较大的对等带宽而已成为构建百亿分之一的HPC系统的有前途的解决方案。蜻蜓包括形成组的本地链接和通过高带宽光学链接将这些组连接起来的全局链接。蜻蜓网络设计的许多方面尚待探索，例如全局链接的连通性对性能的影响，即全局链接安排，本地和全局链接的带宽或作业分配算法。本文首先介绍了一个数据包级仿真框架，以详细建模HPC应用程序的性能。对于给定的工作分配算法和网络拓扑，提出的框架能够模拟已知的MPI（消息传递接口）例程以及具有自定义通信模式的应用程序。使用此仿真框架，我们研究了蜻蜓拓扑中全局链路带宽和安排，通信模式和强度，作业分配和任务映射算法以及路由机制之间的耦合。我们证明，通过选择正确的系统设置和工作负载分配算法组合，可以将通信开销减少多达44％。我们还显示，与其他布置相比，循环布置可提供高达15％的对等带宽，但是对于实际的工作负载，链路布置的性能影响小于3％。

著录项

来源
《IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing》|2017年|325-334|共10页
会议地点
作者
Fulya Kaplan; Ozan Tuncer; Vitus J. Leung; Scott K. Hemmert; Ayse K. Coskun;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Routing; Bandwidth; Network topology; Ports (Computers); Resource management; Optical fiber communication; Topology;

机译：路由;带宽;网络拓扑;端口（计算机）;资源管理;光纤通信;拓扑;

相似文献

外文文献
中文文献
专利

1. The management of international manufacturing networks: a missing link towards total management of global networks [J] . Cheng Yang, Farooq Sami, Johansen John, Production Planning & Control . 2019,第1a4期

机译：国际制造网络管理：全球网络总管理的缺失联系
2. Measurement and computer modeling of temporary arrangements of polygonal actin structures in trabecular meshwork cells which consist of cross-linked actin networks and polygonal actin arrangements [J] . ZhengY., CurrieL., PollockN., Journal of ocular pharmacology and therapeutics: The official journal of the Association for Ocular Pharmacology and Therapeutics . 2014,第2a3期

机译：小梁网状细胞中多边形肌动蛋白结构的临时排列的测量和计算机建模，该结构由交联的肌动蛋白网络和多边形肌动蛋白排列组成
3. Cross-Layer Rate Control in Wireless Networks with Lossy Links: Leaky-Pipe Flow, Effective Network Utility Maximization and Hop-by-Hop Algorithms [J] . Gao Q., Zhang J., Hanly S.V. Wireless Communications, IEEE Transactions on . 2009,第6期

机译：有损链路的无线网络中的跨层速率控制：管道泄漏流量，有效的网络实用程序最大化和逐跳算法
4. Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks [C] . Fulya Kaplan, Ozan Tuncer, Vitus J. Leung, IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing . 2017

机译：揭示蜻蜓网络上的全局链路布置与网络管理算法之间的相互作用
5. Incorporating link correlations in models and algorithms for localization in wireless sensor networks [D] . Agrawal, Piyush 2012

机译：将链路相关性纳入模型和算法中以在无线传感器网络中进行定位
6. A Novel Clustering Algorithm for Mobile Ad Hoc Networks Based on Determination of Virtual Links Weight to Increase Network Stability [O] . Abbas Karimi, Abbas Afsharfarnia, Faraneh Zarafshan, -1

机译：基于虚拟链路权重确定以提高网络稳定性的移动自组织网络的新型聚类算法
7. The management of international manufacturing networks: a missing link towards total management of global networks [O] . Yang Cheng, Sami Farooq, John Johansen, 2019

机译：国际制造网络管理：全球网络总管理的缺失链接

Unveiling the Interplay Between Global Link Arrangements and Network Management Algorithms on Dragonfly Networks

摘要

著录项

相似文献

相关主题

期刊订阅