...
首页> 外文期刊>Parallel Computing >Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing
【24h】

Sharded Router: A novel on-chip router architecture employing bandwidth sharding and stealing

机译:分片路由器:一种采用带宽分片和窃取的新颖片上路由器架构

获取原文
获取原文并翻译 | 示例
           

摘要

Packet-based networks-on-chip (NoC) are considered among the most viable candidates for the on-chip interconnection network of many-core chips. Unrelenting increases in the number of processing elements on a single chip die necessitate a scalable and efficient communication fabric. The resulting enlargement of the on-chip network size has been accompanied by an equivalent widening of the physical inter-router channels. However, the growing link bandwidth is not fully utilized, because the packet size is not always a multiple of the channel width. While slicing of the physical channel enhances link utilization, it incurs additional delay, because the number of flit per packet also increases. This paper proposes a novel router micro-architecture that employs fine-grained bandwidth "sharding" (i.e., partitioning) and stealing in order to mitigate the elevation in the zero-load latency caused by slicing. Consequently, the zero-load latency of the Sharded Router becomes identical with that of a conventional router, whereas its throughput is markedly improved by fully utilizing all available bandwidth. Detailed experiments using a full-system simulation framework indicate that the proposed router reduces the average network latency by up to 19% and the execution time of real multi-threaded workloads by up to 43%. Finally, hardware synthesis analysis verifies the modest area overhead of the Sharded Router over a conventional design.
机译:基于分组的片上网络(NoC)被认为是多核芯片的片上互连网络最可行的候选方案之一。单芯片裸片上处理元件数量的不懈增加,就需要可扩展且高效的通信架构。随着片上网络规模的扩大,物理路由器间通道也相应增加。但是,由于分组大小并不总是信道宽度的倍数,因此无法充分利用不断增长的链路带宽。切片物理信道可提高链路利用率,但会导致额外的延迟,因为每个数据包的中断数量也会增加。本文提出了一种新颖的路由器微体系结构,该体系结构使用了细粒度的带宽``分片''(即分区)和窃取,以减轻由于切片导致的零负载等待时间的增加。因此,分片路由器的零负载等待时间变得与传统路由器相同,而通过充分利用所有可用带宽可显着提高其吞吐量。使用全系统仿真框架进行的详细实验表明,所建议的路由器将平均网络延迟减少了多达19%,将实际多线程工作负载的执行时间减少了多达43%。最后,硬件综合分析验证了传统设计上分片路由器的适度区域开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号