...
首页> 外文期刊>IEEE Transactions on Computers >Thread Criticality Assisted Replication and Migration for Chip Multiprocessor Caches
【24h】

Thread Criticality Assisted Replication and Migration for Chip Multiprocessor Caches

机译:线程关键性辅助芯片多处理器缓存的复制和迁移

获取原文
获取原文并翻译 | 示例

摘要

Non-Uniform Cache Architecture (NUCA) is a viable solution to mitigate the problem of large on-chip wire delay due to the rapid increase in the cache capacity of chip multiprocessors (CMPs). Through partitioning the last-level cache (LLC) into smaller banks connected by on-chip network, the access latency will exhibit non-uniform distribution. Various works have well explored the NUCA design, including block migration, block replication and block searching. However, all of the previous mechanisms designed for NUCA are thread-oblivious when multi-threaded applications are deployed on CMP systems. Due to the interference on shared resources, threads often demonstrate unbalanced progress wherein the lagging threads with slow progress are more critical to overall performance. In this paper, we propose a novel NUCA design called thread Criticality Assisted Replication and Migration (CARM). CARM exploits the runtime thread criticality information as hints to adjust the block replication and migration in NUCA. Specifically, CARM aims at boosting parallel application execution through prioritizing block replication and migration for critical threads. Full-system experimental results show that CARM reduces the execution time of a set of PARSEC workloads by 13.7 and 6.8 percent on average compared with the tradition D-NUCA and Re-NUCA respectively. Moreover, CARM also consumes much less energy compared with the evaluated schemes.
机译:非统一缓存体系结构(NUCA)是一种可行的解决方案,可以缓解由于芯片多处理器(CMP)的缓存容量快速增加而导致的片上线路延迟过长的问题。通过将最后一级缓存(LLC)划分为通过片上网络连接的较小存储区,访问延迟将呈现出不均匀的分布。各种工作已经很好地探索了NUCA的设计,包括块迁移,块复制和块搜索。但是,在CMP系统上部署多线程应用程序时,为NUCA设计的所有先前机制都是线程可忽略的。由于对共享资源的干扰,线程通常表现出不均衡的进度,其中进度缓慢的滞后线程对于整体性能更为关键。在本文中,我们提出了一种新颖的NUCA设计,称为线程关键性辅助复制和迁移(CARM)。 CARM利用运行时线程关键性信息作为提示来调整NUCA中的块复制和迁移。具体而言,CARM旨在通过优先处理关键线程的块复制和迁移来促进并行应用程序的执行。整个系统的实验结果表明,与传统的D-NUCA和Re-NUCA相比,CARM分别将一组PARSEC工作负载的执行时间平均减少了13.7%和6.8%。此外,与评估方案相比,CARM还消耗更少的能量。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号