An adaptive migration-replication scheme (AMR) for shared cache in chip multiprocessors

Chaturvedi Nitin; Subramaniyan Arun; Gurunarayanan S.

首页> 外文期刊>Journal of supercomputing >An adaptive migration-replication scheme (AMR) for shared cache in chip multiprocessors

【24h】

An adaptive migration-replication scheme (AMR) for shared cache in chip multiprocessors

机译：芯片多处理器中共享缓存的自适应迁移复制方案（AMR）

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Most of today's chip multiprocessors implement last-level shared caches as non-uniform cache architectures. A major problem faced by such multicore architectures is cache line placement, especially in scenarios where multiple cores compete for line usage in the single non-uniform shared L2 cache. Block migration has been suggested to overcome the problem of optimum placement of cache blocks. Previous research, however, shows that an uncontrolled block migration scheme leads to scenarios where a cache line 'ping-pongs' between two requesting cores resulting in higher access latency for both the requestors and greater power dissipation. To address this problem, this paper first proposes a mechanism to dynamically profile data block usage from different cores on the chip. We then propose an adaptive migration-replication scheme for shared last-level non-uniform cache architectures that adapts between selectively replicating frequently used cache lines near the requesting cores and cache line migration towards the requesting core in case of fewer requests. AMR eliminates 'ping-ponging' of cache lines between the banks of the requesting cores. However, any mechanism that dynamically adapts between migration and replication at runtime is bound to have a complex search scheme to locate data blocks. To simplify the data lookup policy, this work also presents an efficient data access mechanism for non-uniform cache architectures. Our proposal relies on low overhead and highly accurate in-hardware pointers to keep track of the on-chip location of the cache block. We show that our proposed scheme reduces the completion time by on average 12.25, 8.1 and 3 % and energy consumption by 11.65, 8.5 and 2.1 % when compared to state-of-the-art last-level cache management schemes S-NUCA, D-NUCA and HK-NUCA, respectively. SPEC and PARSEC benchmarks were used to thoroughly evaluate our proposal.

机译：当今的大多数芯片多处理器将最后一级的共享缓存实现为非统一的缓存体系结构。这种多核体系结构面临的主要问题是高速缓存行的放置，特别是在多个内核在单个非均匀共享L2高速缓存中争夺行使用率的情况下。已经提出了块迁移以克服高速缓存块的最佳放置的问题。但是，先前的研究表明，不受控制的块迁移方案会导致以下场景：两个请求核心之间的高速缓存行“乒乓”，导致请求者的访问延迟更长，功耗更高。为了解决这个问题，本文首先提出了一种动态分析来自芯片上不同内核的数据块使用情况的机制。然后，我们为共享的最后一级非均匀缓存体系结构提出了一种自适应迁移-复制方案，该方案在选择性复制请求核心附近的频繁使用的缓存行与请求数量较少的情况下朝请求核心的缓存行迁移之间进行调整。 AMR消除了请求核心组之间的高速缓存行的“ ping-ponging”。但是，任何在运行时在迁移和复制之间动态适应的机制都必须具有复杂的搜索方案来定位数据块。为了简化数据查找策略，这项工作还提出了一种用于非统一缓存体系结构的有效数据访问机制。我们的建议依靠低开销和高精度硬件内指针来跟踪缓存块的片上位置。我们表明，与最新的最新级高速缓存管理方案S-NUCA，D相比，我们提出的方案平均将完成时间减少了12.25％，8.1％和3％，将能耗降低了11.65％，8.5％和2.1％。 -NUCA和HK-NUCA。 SPEC和PARSEC基准用于彻底评估我们的建议。

著录项

来源
《Journal of supercomputing》 |2015年第10期|3904-3933|共30页
作者
Chaturvedi Nitin; Subramaniyan Arun; Gurunarayanan S.;
展开▼
作者单位

Birla Inst Technol & Sci, Elect Elect Engn Dept, Pilani, Rajasthan, India;

Birla Inst Technol & Sci, Elect Elect Engn Dept, Pilani, Rajasthan, India;

Birla Inst Technol & Sci, Elect Elect Engn Dept, Pilani, Rajasthan, India;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Chip multiprocessors (CMP); Non-uniform cache architecture (NUCA);

机译：芯片多处理器（CMP）;非均匀缓存体系结构（NUCA）;

相似文献

外文文献
中文文献
专利

1. Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors [J] . Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin Computer architecture news . 2008,第1期

机译：自适应集固定：管理芯片多处理器中的共享缓存
2. Adaptive set pinning: managing shared caches in chip multiprocessors [J] . Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2008,第3期

机译：自适应集固定：管理芯片多处理器中的共享缓存
3. Adaptive Cache Coherence Mechanisms with Producer–Consumer Sharing Optimization for Chip Multiprocessors [J] . Kayi A., Serres O., El-Ghazawi T. Computers, IEEE Transactions on . 2015,第2期

机译：面向芯片多处理器的生产者-消费者共享优化的自适应缓存一致性机制
4. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors [C] . Haakon Dybdahl, Per Stenstrom IEEE International Symposium on High Performance Computer Architecture . 2007

机译：芯片多处理器的自适应共享/私有NUCA缓存分区方案
5. Adaptive and integrated data cache prefetching for shared memory multiprocessors [D] . Gornish, Edward H. 1995

机译：共享内存多处理器的自适应和集成数据高速缓存预取
6. PPCS: A Progressive Popularity-Aware Caching Scheme for Edge-Based Cache Redundancy Avoidance in Information-Centric Networks [O] . Quang Ngoc Nguyen, Jiang Liu, Zhenni Pan, 2019

机译：PPCS：渐进式普及感知缓存方案用于以信息为中心的网络中基于边缘的缓存冗余避免
7. An Adaptive Shared/Private NUCA Cache Partitioning Scheme for Chip Multiprocessors [O] . Haakon Dybdahl, Per Stenström 2007

机译：芯片多处理器的自适应共享/专用NUCA缓存分区方案

An adaptive migration-replication scheme (AMR) for shared cache in chip multiprocessors

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅