Optimizing shared cache behavior of chip multiprocessors

机译：优化芯片多处理器的共享缓存行为

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

One of the critical problems associated with emerging chip multiprocessors (CMPs) is the management of on-chip shared cache space. Unfortunately, single processor centric data locality optimization schemes may not work well in the CMP case as data accesses from multiple cores can create conflicts in the shared cache space. The main contribution of this paper is a compiler directed code restructuring scheme for enhancing locality of shared data in CMPs. The proposed scheme targets the last level shared cache that exist in many commercial CMPs and has two components, namely, allocation, which determines the set of loop iterations assigned to each core, and scheduling, which determines the order in which the iterations assigned to a core are executed. Our scheme restructures the application code such that the different cores operate on shared data blocks at the same time, to the extent allowed by data dependencies. This helps to reduce reuse distances for the shared data and improves on-chip cache performance. We evaluated our approach using the Splash-2 and Parsec applications through both simulations and experiments on two commercial multi-core machines. Our experimental evaluation indicates that the proposed data locality optimization scheme improves inter-core conflict misses in the shared cache by 67% on average when both allocation and scheduling are used. Also, the execution time improvements we achieve (29% on average) are very close to the optimal savings that could be achieved using a hypothetical scheme.

机译：与新兴芯片多处理器（CMP）相关的关键问题之一是片上共享缓存空间的管理。不幸的是，以单处理器为中心的数据局部性优化方案在CMP情况下可能无法很好地工作，因为来自多个内核的数据访问会在共享缓存空间中产生冲突。本文的主要贡献是一种用于增强CMP中共享数据局部性的编译器定向代码重组方案。拟议的方案针对许多商业CMP中存在的最后一级共享缓存，它具有两个组件，即分配（该确定分配给每个核心的循环迭代集）和调度（确定分配给一个核的迭代顺序）。核心被执行。我们的方案对应用程序代码进行了重组，以使不同的内核在数据依赖关系允许的范围内同时对共享数据块进行操作。这有助于减少共享数据的重用距离，并提高片上缓存性能。我们通过在两台商用多核计算机上进行的仿真和实验，使用Splash-2和Parsec应用程序评估了我们的方法。我们的实验评估表明，当同时使用分配和调度时，所提出的数据局部性优化方案可使共享缓存中的内核间冲突丢失平均降低67％。同样，我们实现的执行时间改进（平均29％）非常接近使用假设方案可以实现的最佳节省。

著录项

来源
《Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture》|2009年|505-516|共12页
会议地点
作者
Kandemir Mahmut; Muralidhara Sai Prashanth; Narayanan Sri Hari Krishna; Zhang Yuanrui; Ozturk Ozcan;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类
关键词
Algorithm; Design; Experimentation; Performance;

机译：算法;设计;实验;性能;

相似文献

外文文献
中文文献
专利

1. Adaptive Set Pinning: Managing Shared Caches in Chip Multiprocessors [J] . Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin Computer architecture news . 2008,第1期

机译：自适应集固定：管理芯片多处理器中的共享缓存
2. Adaptive set pinning: managing shared caches in chip multiprocessors [J] . Shekhar Srikantaiah, Mahmut Kandemir, Mary Jane Irwin ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2008,第3期

机译：自适应集固定：管理芯片多处理器中的共享缓存
3. An LRU-based Replacement Algorithm Augmented with Frequency of Access in Shared Chip-Multiprocessor Caches [J] . Haakon Dybdahl, Per Stenstroem, Lasse Natvig Computer architecture news . 2007,第4期

机译：共享芯片多处理器高速缓存中基于访问频率增强的基于LRU的替换算法
4. Optimizing shared cache behavior of chip multiprocessors [C] . Mahmut Kandemir, Sai Prashanth Muralidhara, Sri Hari Krishna Narayanan, Proceedings of the 42nd Annual IEEE/ACM International Symposium on Microarchitecture . 2009

机译：优化芯片多处理器的共享缓存行为
5. Performance evaluation of TLB consistency solutions in large-scale shared-memory multiprocessors with consistent caches. [D] . Maydeo, Ketan A. 2005

机译：具有一致的高速缓存的大型共享内存多处理器中TLB一致性解决方案的性能评估。
6. Single‐nucleotide polymorphisms in cachexia‐related genes: Can they optimize the treatment of cancer cachexia? [O] . Junichi Ishida, Masakazu Saitoh, Jochen Springer 2017

机译：恶病质相关基因中的单核苷酸多态性：它们能否优化癌症恶病质的治疗？
7. Optimizing shared cache behavior of chip multiprocessors [O] . Kandemir, M., Muralidhara, S.P., Narayanan, S.H.K., 2009

机译：优化芯片多处理器的共享缓存行为

Optimizing shared cache behavior of chip multiprocessors

摘要

著录项

相似文献

相关主题

期刊订阅