首页> 外文学位 >The timekeeping methodology: Exploiting generational behavior to improve processor power and performance.
【24h】

The timekeeping methodology: Exploiting generational behavior to improve processor power and performance.

机译:计时方法:利用世代行为来提高处理器能力和性能。

获取原文
获取原文并翻译 | 示例

摘要

Today's CPU designers face increasingly aggressive CPU performance goals while also dealing with challenging limits on CPU power dissipation. The conflict of performance and power requirements increases the importance of simple but effective solutions for the widening gap between processor and memory performance. My research has demonstrated how aspects of processor and memory behavior can be optimized by exploiting knowledge about the time durations between key processor and memory events. These “timekeeping” techniques can give performance or power improvements with simple hardware structures.; In this thesis, the cache memory hierarchy is used as the main example to illustrate the effectiveness of the timekeeping methodology. I start by introducing basic concepts for this methodology, including the generational nature of cache reference streams. Using statistical distributions of key timekeeping metrics like reload and access intervals, I show how the metrics form the basis for a rich set of policies that can classify and predict program behavior. From these metrics and predictions, hardware mechanisms can be built to optimize the power or performance of the on-chip memory hierarchy.; Three mechanisms are presented to illustrate the application of the timekeeping methodology to the memory system. The first mechanism, cache decay, can reduce cache leakage energy by 4X by identifying long-idle cache lines with simple 2-bit counters and turning them off. The second mechanism, a timekeeping victim cache filter, uses the same counters in cache decay to capture cache lines with short dead times and choose them as candidates of using the victim buffer. This mechanism can filter out 87% of victim buffer traffic while improving performance. Both cache decay and the victim buffer filter exploit cache line lifetime behavior within single generations. In the third mechanism, timekeeping prefetch, we demonstrate how to exploit the regularity across consecutive generations of the same cache line. Timekeeping prefetch uses live time and next address of the previous generation as predictions for the current generation. The resulting prefetcher is highly effective and at the same time hardware-efficient. With an 8KB history table, an average performance improvement of 11% can be achieved across the whole SPEC2000 benchmark suite. This outperforms a recent proposal with a 2MB history table.; Outside the memory system, this thesis also shows how the timekeeping methodology can be applied to other subsystems such as branch predictors. A key characteristic of branch predictor data is that they are transient and predictive, in the sense that they are execution hints that do not affect program correctness, and they are often short-lived. To exploit this characteristic, we propose to use naturally decaying 4-transistor memory cells to build branch predictors, instead of traditional 6-transistor cells. This implementation can reduce branch predictor leakage by about 60-80% while providing a cell area advantage up to 33%.; The techniques presented in thus thesis clearly demonstrate the power of the timekeeping methodology. We expect that in our future work, as well as in work by other researchers, more timekeeping techniques can be proposed to help future processors to meet the many challenges in power and performance.
机译:当今的CPU设计人员面临着越来越高的CPU性能目标,同时还要应对CPU功耗方面的挑战性限制。性能和功耗要求之间的冲突增加了简单而有效的解决方案对处理器和内存性能之间不断扩大的差距的重要性。我的研究表明,如何通过利用有关关键处理器和内存事件之间持续时间的知识来优化处理器和内存行为的各个方面。这些“计时”技术可以通过简单的硬件结构来提高性能或功耗。本文以缓存结构为例,说明了计时方法的有效性。我首先介绍这种方法的基本概念,包括缓存参考流的生成特性。通过使用诸如重新加载和访问间隔之类的关键计时指标的统计分布,我展示了这些指标如何构成丰富的策略集(可以对程序行为进行分类和预测)的基础。根据这些指标和预测,可以构建硬件机制来优化片上存储器层次结构的功能或性能。提出了三种机制来说明计时方法在存储器系统中的应用。第一种机制是高速缓存衰减,它可以通过使用简单的2位计数器识别长空闲高速缓存行并将其关闭来将高速缓存泄漏能量降低4倍。第二种机制是计时牺牲受害者缓存过滤器,它在缓存衰减中使用相同的计数器来捕获具有短死区时间的缓存行,并将其选择为使用受害者缓冲区的候选者。该机制可以过滤掉87%的受害者缓冲区流量,同时提高了性能。缓存衰减和受害者缓冲区过滤器都利用了单代内的缓存行生存期行为。在第三个机制中,计时预取,我们演示了如何利用同一高速缓存行的连续几代中的规则性。计时预取使用上一代的实时时间和下一个地址作为当前一代的预测。最终的预取器非常高效,同时具有硬件效率。有了8KB的历史记录表,整个SPEC2000基准测试套件的平均性能可提高11%。具有2MB历史记录表的性能优于最近的建议。在存储系统之外,本文还展示了计时方法如何应用于其他子系统,例如分支预测器。分支预测变量数据的一个关键特征是它们是瞬态的和预测性的,从某种意义上说,它们是不影响程序正确性的执行提示,并且它们通常是短暂的。为了利用此特性,我们建议使用自然衰减的4晶体管存储单元来构建分支预测器,而不是传统的6晶体管单元。这种实施方式可以将分支预测变量的泄漏减少约60-80%,同时提供高达33%的单元面积优势。因此,本文中介绍的技术清楚地证明了计时方法的强大功能。我们希望在未来的工作以及其他研究人员的工作中,可以提出更多的计时技术来帮助未来的处理器应对功率和性能方面的许多挑战。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号