首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >A Cost-Efficient Router Architecture for HPC Inter-Connection Networks: Design and Implementation
【24h】

A Cost-Efficient Router Architecture for HPC Inter-Connection Networks: Design and Implementation

机译:用于HPC间连接网络的经济高效的路由器架构:设计和实现

获取原文
获取原文并翻译 | 示例

摘要

High-radix routers with lower latency and higher bandwidth play an increasingly important role in constructing large-scale interconnection networks such as those used in super-computers and datacenters. The tile-based crossbar approach partitions a single large crossbar into many small tiles and can considerably reduce the complexity of arbitration while providing higher throughput than the conventional switch implementation. However, it is not scalable due to power consumption, placement, and routing problems. Inspired by non-saturated throughput theory, this paper proposes a scalable router microarchitecture, termed Multiport Binding Tile-based Router (MBTR). By aggregating multiple physical ports into a single tile a high-radix router can be flexibly organized into different tile arrays, thus the number of tiles and hardware overhead can be considerably reduced. For a radix-64 router MBTR achieves up to 50 similar to 75% reduction in memory consumption as well as wire area compared with a hierarchical switch. We theoretically deduce the sufficient and necessary conditions for the asymmetrical crossbar to achieve un-saturated relative 100 percent throughput. Based on this observation we analyze the MBTR throughput and derive the condition that should be satisfied by the MBTR design parameters to yield 100 percent throughput. We further discuss how to make a trade-off between MBTR parameters based on the constraints of performance, power and area. The simulation results demonstrate MBTR is indistinguishable from the YARC router in terms of throughput and delay, and can even outperform it by reducing potential contention for output ports. We have fabricated a 36-port MBTR chip at 28 nm, providing 100 Gb/s bidirectional bandwidth per port, with a fall-through latency of just 30 ns. Internally it runs at 9.6 Tb/s, thus offering a speedup of 1.34x.
机译:具有较低延迟和更高带宽的高直射路由器在构建大型互连网络中的诸如在超级计算机和数据中心使用的大型互连网络中起着越来越重要的作用。基于瓦片的横杆接近将单个大横杆分隔成许多小瓦片,并且可以显着降低仲裁的复杂性,同时提供比传统开关实现更高的吞吐量。但是,由于功耗,放置和路由问题,它不可扩展。通过非饱和吞吐论的启发,本文提出了一个可扩展的路由器微架构,称为多端口绑定图块的路由器(MBTR)。通过将多个物理端口聚合到单个区块中,可以灵活地组织到不同的图块阵列中,因此可以显着降低瓦片数和硬件开销的数量。对于RADIX-64路由器MBTR实现高达50,与分层交换机相比,最高可达75%的内存消耗降低以及电线区域。理论上我们向非对称横杆推导了足够的必要条件,以实现未饱和的相对100%的产量。基于此观察结果,我们分析了MBTR吞吐量并导出了MBTR设计参数应满足的条件,以产生100%的吞吐量。我们进一步讨论了如何根据性能,功率和区域的约束在MBTR参数之间进行权衡。模拟结果演示MBTR在吞吐量和延迟方面无法从YARC路由器无法区分,甚至可以通过减少输出端口的潜在争用来优于它。我们在28 nm下制造了36端口MBTR芯片,每端口提供100 GB / s双向带宽,其延迟仅为30 ns。内部它运行在9.6 TB / s,从而提供1.34倍的加速。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号