首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Scalable Deadlock-Free Deterministic Minimal-Path Routing Engine for InfiniBand-Based Dragonfly Networks
【24h】

Scalable Deadlock-Free Deterministic Minimal-Path Routing Engine for InfiniBand-Based Dragonfly Networks

机译:适用于基于InfiniBand的Dragonfly网络的可扩展的无死锁确定性最小路径路由引擎

获取原文
获取原文并翻译 | 示例

摘要

Dragonfly topologies are gathering great interest nowadays as one of the most promising interconnect options for High-Performance Computing (HPC) systems. However, Dragonflies contain physical cycles that may lead to traffic deadlocks unless the routing algorithm prevents them properly. In general, existing deadlock-free routing algorithms, either deterministic or adaptive, proposed for Dragonflies, use Virtual Channels (VCs) to prevent cyclic dependencies. However, these topology-aware algorithms are difficult to implement, or even unfeasible, in systems based on the InfiniBand (IB) architecture, which is nowadays the most widely used network technology in HPC systems. This is due to some limitations in the IB specification, specifically regarding the way Virtual Lanes (VLs), which are considered as similar to VCs, can be assigned to traffic flows. Indeed, none of the routing engines currently available in the official releases of the IB control software has been specifically proposed for Dragonflies. In this paper, we present a new deterministic, minimal-path routing for Dragonfly that prevents deadlocks using VLs according to the IB specification, so that it can be straightforwardly implemented in IB-based networks. We have called this proposal D3R (Deterministic Deadlock-free Dragonfly Routing). Specifically, D3R maps each route to a single, specific VL depending on the destination group, and according to a specific order, so that cyclic dependencies (so deadlocks) are prevented. D3R is scalable as it requires only 2 VLs to prevent deadlocks regardless of network size, i.e., fewer VLs than the required by the deadlock-free routing engines available in IB that are suitable for Dragonflies. Alternatively, D3R achieves higher throughput if an additional VL is used to reduce internal contention in the Dragonfly groups. We have implemented D3R as a new routing engine in OpenSM, the control software including the subnet manager in IB. We have evaluated D3R by means of simulation and by experiments performed in a real IB-based cluster, the results showing that, in general, D3R outperforms other routing engines.
机译:如今,蜻蜓拓扑作为高性能计算(HPC)系统最有希望的互连选项之一,引起了人们的极大兴趣。但是,蜻蜓包含物理周期,除非路由算法正确阻止了物理周期,否则它们可能导致流量死锁。通常,为蜻蜓提议的现有的确定性或自适应无死锁路由算法都使用虚拟通道(VC)来防止循环依赖性。但是,这些基于拓扑的算法很难在基于InfiniBand(IB)架构的系统中实现,甚至不可行,InfiniBand(IB)架构是当今HPC系统中使用最广泛的网络技术。这是由于IB规范中的某些限制,特别是关于虚拟通道(VL)的方式(被认为与VC类似)可以分配给业务流。实际上,IB控制软件的正式版本中当前没有可用的路由引擎是为蜻蜓专门提出的。在本文中,我们为Dragonfly提供了一种新的确定性,最小路径路由,该路由可防止使用根据IB规范使用VL的死锁,因此可以在基于IB的网络中直接实现它。我们将此提案称为D3R(无确定性死锁的蜻蜓路由)。具体而言,D3R根据目标组并根据特定顺序将每个路由映射到单个特定的VL,从而避免了循环依赖性(因此出现了死锁)。 D3R具有可伸缩性,因为它仅需2个VL即可防止死锁,而与网络规模无关,即与IB中适用于Dragonflies的无死锁路由引擎所需的VL相比,VL更少。或者,如果使用附加的VL来减少Dragonfly组中的内部争用,则D3R将实现更高的吞吐量。我们已经将D3R实施为OpenSM中的新路由引擎,该控制软件包括IB中的子网管理器。我们已经通过仿真和在基于IB的实际集群中进行的实验对D3R进行了评估,结果表明,总体而言,D3R优于其他路由引擎。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号