Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors

Abousamra A.

首页> 外文期刊>Parallel and Distributed Systems, IEEE Transactions on >Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors

【24h】

Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors

机译：NoC和缓存组织的共同设计，用于减少芯片多处理器中的访问延迟

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reducing data access latency is vital to achieving performance improvements in computing. For chip multiprocessors (CMPs), data access latency depends on the organization of the memory hierarchy, the on-chip interconnect, and the running workload. Several network-on-chip (NoC) designs exploit communication locality to reduce communication latency by configuring special fast paths or circuits on which communication is faster than the rest of the NoC. However, communication patterns are directly affected by the cache organization and many cache organizations are designed in isolation of the underlying NoC or assume a simple NoC design, thus possibly missing optimization opportunities. In this work, we take a codesign approach of the NoC and cache organization. First, we propose a hybrid circuit/packet-switched NoC that exploits communication locality through periodic configuration of the most beneficial circuits. Second, we design a Unique Private (UP) caching scheme targeting the class of interconnects which exploit communication locality to improve communication latency. The Unique Private cache stores the data that are mostly accessed by each processor core in the core's locally accessible cache bank, while leveraging dedicated high-speed circuits in the interconnect to provide remote cores with fast access to shared data. Simulations of a suite of scientific and commercial workloads show that our proposed design achieves a speedup of 15.2 and 14 percent on a 16-core and a 64-core CMP, respectively, over the state-of-the-art NoC-Cache codesigned system that also exploits communication locality in multithreaded applications.

机译：减少数据访问延迟对于提高计算性能至关重要。对于芯片多处理器（CMP），数据访问延迟取决于存储器层次结构的组织，片上互连以及正在运行的工作负载。几种片上网络（NoC）设计通过配置特殊的快速路径或电路（其通信速度比NoC的其余部分更快）来利用通信局部性来减少通信延迟。但是，通信模式直接受缓存组织的影响，许多缓存组织的设计是与底层NoC隔离的，或者采用简单的NoC设计，因此可能会缺少优化机会。在这项工作中，我们采用NoC和缓存组织的代码签名方法。首先，我们提出了一种混合电路/分组交换NoC，它通过对最有利电路的周期性配置来利用通信局部性。其次，我们针对互连类别设计一种独特的专用（UP）缓存方案，该互连利用通信局部性来改善通信延迟。唯一专用高速缓存将大多数处理器核心访问的数据存储在内核的本地可访问高速缓存组中，同时利用互连中的专用高速电路为远程内核提供对共享数据的快速访问。对一组科学和商业工作负载的仿真表明，与最新的NoC-Cache编码系统相比，我们提出的设计在16核和64核CMP上分别实现了15.2％和14％的加速。它也利用多线程应用程序中的通信局部性。

著录项

来源
《Parallel and Distributed Systems, IEEE Transactions on》 |2012年第6期|p.1038-1046|共9页
作者
Abousamra A.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Design and Analysis of Location Caches in a NoC-Based Chip Multiprocessor System [J] . D. Ramakrishnan, Y. L. Wu, W. B. Jone Journal of Low Power Electronics . 2010,第2期

机译：基于NoC的芯片多处理器系统中位置缓存的设计和分析
2. An LRU-based Replacement Algorithm Augmented with Frequency of Access in Shared Chip-Multiprocessor Caches [J] . Haakon Dybdahl, Per Stenstroem, Lasse Natvig Computer architecture news . 2007,第4期

机译：共享芯片多处理器高速缓存中基于访问频率增强的基于LRU的替换算法
3. Cache Latency Control for Application Fairness or Differentiation in Power-Constrained Chip Multiprocessors [J] . Wang Xiaorui, Ma Kai, Wang Yefu Computers, IEEE Transactions on . 2012,第10期

机译：功率受限芯片多处理器中应用公平性或差异性的高速缓存延迟控制
4. De-Cache: A novel caching scheme for large-scale NoC based multiprocessor systems-on-chips [C] . 2011 IEEE International SOC Conference . 2011

机译：De-Cache：一种用于大型基于NoC的多处理器片上系统的新颖缓存方案
5. Spatiotemporal capacity management for the last level caches of chip multiprocessors. [D] . Zhan, Dongyuan. 2012

机译：芯片多处理器最后一级缓存的时空容量管理。
6. Reduced LDL-Cholesterol and Reduced Total Cholesterol as Potential Indicators of Early Cancer in Male Treatment-Naïve Cancer Patients With Pre-cachexia and Cachexia [O] . Hannes Zwickl, Klaus Hackner, Harald Köfeler, 2020

机译：降低LDL-胆固醇并将总胆固醇还原为男性治疗中早期癌症的潜在指标 - 幼稚癌癌症患者患者患者患者和恶病
7. Latency reduction techniques in chip multiprocessor cache systems [O] . Zhang Michael Ruogu 1977- 2006

机译：芯片多处理器缓存系统中的延迟降低技术

Codesign of NoC and Cache Organization for Reducing Access Latency in Chip Multiprocessors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅