首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches
【24h】

FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches

机译:FP-NUCA:用于实现大型NUCA缓存的快速NOC层

获取原文
获取原文并翻译 | 示例

摘要

NUCA caches have traditionally been proposed as a solution for mitigating wire delays, and delays introduced due to complex networks on chip. Traditional approaches have reported significant performance gains with intelligent block placement, location, replication, and migration schemes. In this paper, we propose a novel approach in this space, called FP-NUCA. It differs from conventional approaches, and relies on a novel method of co-designing the last level cache and the network on chip. We artificially constrain the communication pattern in the NUCA cache such that all the messages travel along a few predefined paths () for each set of banks. We leverage this communication pattern by designing a new type of NOC router called the router, which augments a regular router by adding a layer of circuitry that gates the clock of the regular router when there is a message waiting to be transmitted. Messages along the do not require buffering, switching, or routing. We incorporate a bank predictor with our novel NOC for reducing the number of messages, and resultant energy consumption. We compare our performance with state of the art protocols, and report speedups of up to 31 percent (mean: 6.3 percent), and reduction up to 46 percent (mean: 10.4 percent) for a suite of Splash and Parsec benchmarks. We implement the router in VHDL and show that the additional fast path logic has minimal area and timing overheads.
机译:NUCA缓存传统上已被提出作为缓解线路延迟的解决方案,并且由于复杂的片上网络而引入了延迟。传统方法已报告了通过智能块放置,定位,复制和迁移方案显着提高的性能。在本文中,我们提出了在这一领域的一种新方法,称为FP-NUCA。它与传统方法不同,它依赖于共同设计末级缓存和片上网络的新颖方法。我们人为地限制了NUCA缓存中的通信模式,以使所有消息都沿着每组存储库的几个预定义路径()传播。我们通过设计一种称为路由器的新型NOC路由器来利用这种通信模式,该路由器通过添加一层在有消息等待发送时对常规路由器的时钟进行门控的电路来增强常规路由器的功能。沿途的消息不需要缓冲,交换或路由。我们将银行预测器与我们的新型NOC结合使用,以减少消息数量并减少能耗。我们将性能与最先进的协议进行了比较,并针对一组Splash和Parsec基准测试报告了高达31%的加速(平均:6.3%)和减少了46%(平均:10.4%)的加速。我们在VHDL中实现了路由器,并显示了其他快速路径逻辑具有最小的面积和时序开销。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号