首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Cost-Efficient Router Architecture for HPC Inter-Connection Networks: Design and Implementation
【24h】

A Cost-Efficient Router Architecture for HPC Inter-Connection Networks: Design and Implementation

机译:适用于HPC互连网络的经济高效的路由器体系结构:设计和实现

获取原文
获取原文并翻译 | 示例

摘要

High-radix routers with lower latency and higher bandwidth play an increasingly important role in constructing large-scale interconnection networks such as those used in super-computers and datacenters. The tile-based crossbar approach partitions a single large crossbar into many small tiles and can considerably reduce the complexity of arbitration while providing higher throughput than the conventional switch implementation. However, it is not scalable due to power consumption, placement, and routing problems. Inspired by non-saturated throughput theory, this paper proposes a scalable router microarchitecture, termed Multiport Binding Tile-based Router (MBTR). By aggregating multiple physical ports into a single tile a high-radix router can be flexibly organized into different tile arrays, thus the number of tiles and hardware overhead can be considerably reduced. For a radix-64 router MBTR achieves up to 50 similar to 75% reduction in memory consumption as well as wire area compared with a hierarchical switch. We theoretically deduce the sufficient and necessary conditions for the asymmetrical crossbar to achieve un-saturated relative 100 percent throughput. Based on this observation we analyze the MBTR throughput and derive the condition that should be satisfied by the MBTR design parameters to yield 100 percent throughput. We further discuss how to make a trade-off between MBTR parameters based on the constraints of performance, power and area. The simulation results demonstrate MBTR is indistinguishable from the YARC router in terms of throughput and delay, and can even outperform it by reducing potential contention for output ports. We have fabricated a 36-port MBTR chip at 28 nm, providing 100 Gb/s bidirectional bandwidth per port, with a fall-through latency of just 30 ns. Internally it runs at 9.6 Tb/s, thus offering a speedup of 1.34x.
机译:具有较低延迟和较高带宽的高基数路由器在构建大规模互连网络(例如用于超级计算机和数据中心的网络)中扮演着越来越重要的角色。基于图块的纵横制方法将一个大的纵横式划分为许多小瓦片,可以显着降低仲裁的复杂性,同时提供比常规交换机更高的吞吐量。但是,由于功耗,布局和布线问题,它无法扩展。受非饱和吞吐量理论的启发,本文提出了一种可扩展的路由器微体系结构,称为基于多端口绑定图块的路由器(MBTR)。通过将多个物理端口聚合到单个图块中,可以将高基数路由器灵活地组织为不同的图块阵列,从而可以显着减少图块的数量和硬件开销。对于radix-64路由器,与分层交换机相比,MBTR最多可达到50,相当于减少了75%的内存消耗以及线路面积。从理论上讲,我们得出了非对称横杆的充分必要条件,以实现100%的不饱和吞吐量。基于此观察结果,我们分析了MBTR吞吐量,并得出了MBTR设计参数必须满足的条件才能产生100%的吞吐量。我们将进一步讨论如何根据性能,功率和面积的限制在MBTR参数之间进行权衡。仿真结果表明,MBTR在吞吐量和延迟方面与YARC路由器没有区别,甚至可以通过减少输出端口的潜在竞争而胜过它。我们在28 nm处制造了一个36端口MBTR芯片,每个端口提供100 Gb / s的双向带宽,跌落延迟仅为30 ns。内部运行速度为9.6 Tb / s,因此提速1.34倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号