首页> 外文会议>2014 22nd Annual IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems >Quantifying and Optimizing the Impact of Victim Cache Line Selection in Manycore Systems
【24h】

Quantifying and Optimizing the Impact of Victim Cache Line Selection in Manycore Systems

机译:量化和优化Manycore系统中受害者缓存行选择的影响

获取原文
获取原文并翻译 | 示例

摘要

In both architecture and software, the main goal of data locality-oriented optimizations has always been "minimizing the number of cache misses" (especially, costly last-level cache misses). However, this paper shows that other metrics such as the distance between the last-level cache and memory controller as well as the memory queuing latency can play an equally important role, as far as application performance is concerned. Focusing on a large set of multithreaded applications, we first show that the last-level cache "write backs" (memory writes due to displacement of a victim block from the last-level cache) can exhibit significant latencies as well as variances, and then make a case for "relaxing" the strict LRU policy to save (write back) cycles in both the on-chip network and memory queues. Specifically, we explore novel architecture-level schemes that optimize on-chip network latency, memory queuing latency or both, of the write back messages, by carefully selecting the victim block to write back at the time of cache replacement. Our extensive experimental evaluations using 15 multithreaded applications and a cycle-accurate simulation infrastructure clearly demonstrate that this tradeoffs (between cache hit rate and on-chip network/memory queuing latency) pays off in most of the cases, leading to about 12.2% execution time improvement and 14.9% energy savings, in our default 64-core system with 6 memory controllers.
机译:在体系结构和软件中,面向数据局部性优化的主要目标始终是“最大程度地减少高速缓存未命中的次数”(尤其是代价高昂的最后一级高速缓存未命中)。但是,本文表明,就应用程序性能而言,其他指标(例如最后一级缓存与内存控制器之间的距离以及内存排队等待时间)也可以发挥同等重要的作用。我们着眼于大量的多线程应用程序,我们首先证明了最后一级的缓存“回写”(由于受害者块从最后一级的缓存中移出而导致的内存写入)可以表现出显着的延迟和差异,然后说明“放松”严格的LRU策略以节省(写回)片上网络和内存队列中的周期。具体来说,我们探索新颖的体系结构级别的方案,通过仔细选择受害者块以在替换高速缓存时进行回写,从而优化回写消息的片上网络延迟,内存排队延迟或两者。我们使用15个多线程应用程序和周期精确的仿真基础结构进行的广泛实验评估清楚地表明,这种折衷(在高速缓存命中率和片上网络/内存排队等待时间之间)在大多数情况下是有回报的,导致执行时间约为12.2%在我们的默认64核系统(带6个内存控制器)中,改进和14.9%的节能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号