首页> 外文期刊>Journal of supercomputing >An adaptive migration-replication scheme (AMR) for shared cache in chip multiprocessors
【24h】

An adaptive migration-replication scheme (AMR) for shared cache in chip multiprocessors

机译:芯片多处理器中共享缓存的自适应迁移复制方案(AMR)

获取原文
获取原文并翻译 | 示例
           

摘要

Most of today's chip multiprocessors implement last-level shared caches as non-uniform cache architectures. A major problem faced by such multicore architectures is cache line placement, especially in scenarios where multiple cores compete for line usage in the single non-uniform shared L2 cache. Block migration has been suggested to overcome the problem of optimum placement of cache blocks. Previous research, however, shows that an uncontrolled block migration scheme leads to scenarios where a cache line 'ping-pongs' between two requesting cores resulting in higher access latency for both the requestors and greater power dissipation. To address this problem, this paper first proposes a mechanism to dynamically profile data block usage from different cores on the chip. We then propose an adaptive migration-replication scheme for shared last-level non-uniform cache architectures that adapts between selectively replicating frequently used cache lines near the requesting cores and cache line migration towards the requesting core in case of fewer requests. AMR eliminates 'ping-ponging' of cache lines between the banks of the requesting cores. However, any mechanism that dynamically adapts between migration and replication at runtime is bound to have a complex search scheme to locate data blocks. To simplify the data lookup policy, this work also presents an efficient data access mechanism for non-uniform cache architectures. Our proposal relies on low overhead and highly accurate in-hardware pointers to keep track of the on-chip location of the cache block. We show that our proposed scheme reduces the completion time by on average 12.25, 8.1 and 3 % and energy consumption by 11.65, 8.5 and 2.1 % when compared to state-of-the-art last-level cache management schemes S-NUCA, D-NUCA and HK-NUCA, respectively. SPEC and PARSEC benchmarks were used to thoroughly evaluate our proposal.
机译:当今的大多数芯片多处理器将最后一级的共享缓存实现为非统一的缓存体系结构。这种多核体系结构面临的主要问题是高速缓存行的放置,特别是在多个内核在单个非均匀共享L2高速缓存中争夺行使用率的情况下。已经提出了块迁移以克服高速缓存块的最佳放置的问题。但是,先前的研究表明,不受控制的块迁移方案会导致以下场景:两个请求核心之间的高速缓存行“乒乓”,导致请求者的访问延迟更长,功耗更高。为了解决这个问题,本文首先提出了一种动态分析来自芯片上不同内核的数据块使用情况的机制。然后,我们为共享的最后一级非均匀缓存体系结构提出了一种自适应迁移-复制方案,该方案在选择性复制请求核心附近的频繁使用的缓存行与请求数量较少的情况下朝请求核心的缓存行迁移之间进行调整。 AMR消除了请求核心组之间的高速缓存行的“ ping-ponging”。但是,任何在运行时在迁移和复制之间动态适应的机制都必须具有复杂的搜索方案来定位数据块。为了简化数据查找策略,这项工作还提出了一种用于非统一缓存体系结构的有效数据访问机制。我们的建议依靠低开销和高精度硬件内指针来跟踪缓存块的片上位置。我们表明,与最新的最新级高速缓存管理方案S-NUCA,D相比,我们提出的方案平均将完成时间减少了12.25%,8.1%和3%,将能耗降低了11.65%,8.5%和2.1%。 -NUCA和HK-NUCA。 SPEC和PARSEC基准用于彻底评估我们的建议。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号