FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches

Arora Anuj; Harne Mayur; Sultan Hameedah; Bagaria Akriti; Sarangi Smruti R.

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches

【24h】

FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches

机译：FP-NUCA：用于实现大型NUCA缓存的快速NOC层

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

NUCA caches have traditionally been proposed as a solution for mitigating wire delays, and delays introduced due to complex networks on chip. Traditional approaches have reported significant performance gains with intelligent block placement, location, replication, and migration schemes. In this paper, we propose a novel approach in this space, called FP-NUCA. It differs from conventional approaches, and relies on a novel method of co-designing the last level cache and the network on chip. We artificially constrain the communication pattern in the NUCA cache such that all the messages travel along a few predefined paths () for each set of banks. We leverage this communication pattern by designing a new type of NOC router called the router, which augments a regular router by adding a layer of circuitry that gates the clock of the regular router when there is a message waiting to be transmitted. Messages along the do not require buffering, switching, or routing. We incorporate a bank predictor with our novel NOC for reducing the number of messages, and resultant energy consumption. We compare our performance with state of the art protocols, and report speedups of up to 31 percent (mean: 6.3 percent), and reduction up to 46 percent (mean: 10.4 percent) for a suite of Splash and Parsec benchmarks. We implement the router in VHDL and show that the additional fast path logic has minimal area and timing overheads.

机译：NUCA缓存传统上已被提出作为缓解线路延迟的解决方案，并且由于复杂的片上网络而引入了延迟。传统方法已报告了通过智能块放置，定位，复制和迁移方案显着提高的性能。在本文中，我们提出了在这一领域的一种新方法，称为FP-NUCA。它与传统方法不同，它依赖于共同设计末级缓存和片上网络的新颖方法。我们人为地限制了NUCA缓存中的通信模式，以使所有消息都沿着每组存储库的几个预定义路径（）传播。我们通过设计一种称为路由器的新型NOC路由器来利用这种通信模式，该路由器通过添加一层在有消息等待发送时对常规路由器的时钟进行门控的电路来增强常规路由器的功能。沿途的消息不需要缓冲，交换或路由。我们将银行预测器与我们的新型NOC结合使用，以减少消息数量并减少能耗。我们将性能与最先进的协议进行了比较，并针对一组Splash和Parsec基准测试报告了高达31％的加速（平均：6.3％）和减少了46％（平均：10.4％）的加速。我们在VHDL中实现了路由器，并显示了其他快速路径逻辑具有最小的面积和时序开销。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2015年第9期|2465-2478|共14页
作者
Arora Anuj; Harne Mayur; Sultan Hameedah; Bagaria Akriti; Sarangi Smruti R.;
展开▼
作者单位

Department of Computer Science and Engineering, Indian Institute of Technology Delhi, New Delhi, India;

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词
NUCA caches; bank prediction; freeze router;

机译：NUCA缓存;库预测;冻结路由器;

相似文献

外文文献
中文文献
专利

1. Architectural Techniques for Improving the Power Consumption of NoC-Based CMPs: A Case Study of Cache and Network Layer [J] . Emmanuel Ofori-Attah, Washington Bhebhe, Michael Opoku Agyeman Journal of Low Power Electronics and Applications . 2017,第2期

机译：改善基于NoC的CMP功耗的体系结构技术：缓存和网络层的案例研究
2. MultiCache: Multilayered Cache Implementation for I/O Virtualization [J] . No Jaechun, Park Sung-soon Scientific programming . 2016,第Pta1期

机译：MultiCache：用于I / O虚拟化的多层缓存实现
3. A power-optimized, area-efficient implementation of Connection-Then-Credit NoC physical layer [J] . Elmiligi Haytham, Sallam Mohamed, El-Kharashi M. Watheq Microelectronics journal . 2017,第octa期

机译：Connection-Then-Credit NoC物理层的功耗优化，面积高效的实现
4. Fast-CPA: A Layered Caching Algorithm for Rapid Closest Point of Approach Calculations in Marine Collision Avoidance [C] . Michael R. Benjamin Oceans (Conference) . 2017

机译：FAST-CPA：用于快速最接近的海洋碰撞避免方法计算点的分层缓存算法
5. Design and implementation of NoC routers and their application to PRDT-based NoC's [D] . Neelakrishnan, Shankar Narayanan 2007

机译：NoC路由器的设计与实现及其在基于PRDT的NoC中的应用
6. A Fast Healthcare Interoperability Resources (FHIR) layer implemented over i2b2 [O] . Abdelali Boussadi, Eric Zapletal 2017

机译：在i2b2上实施的快速医疗保健互操作性资源（FHIR）层
7. Architectural techniques for improving the power consumption of NoC-based CMPs: a case study of cache and network layer [O] . Ofori-Attah, Emmanuel, Bhebhe, Washington, Opoku Agyeman, Michael 2017

机译：改善基于NoC的CMP功耗的体系结构技术：缓存和网络层的案例研究
8. Data Replication in Multiprocessor NUCA Systems to Reduce Horizontal Cache Thrashing. [R] . Rajamony, R., Shen, X., Sinharoy, B. 2004

机译：多处理器NUCa系统中的数据复制，以减少水平高速缓存抖动。

FP-NUCA: A Fast NOC Layer for Implementing Large NUCA Caches

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅