首页> 外文期刊>IEEE Transactions on Computers >Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications
【24h】

Understanding why correlation profiling improves the predictability of data cache misses in nonnumeric applications

机译:了解为什么关联分析可以提高非数值应用程序中数据高速缓存未命中的可预测性

获取原文
获取原文并翻译 | 示例
           

摘要

Latency-tolerance techniques offer the potential for bridging the ever-increasing speed gap between the memory subsystem and today's high-performance processors. However, to fully exploit the benefit of these techniques, one must be careful to apply them only to the dynamic references that are likely to suffer cache misses-otherwise the runtime overheads can potentially offset any gains. In this paper, we focus on isolating dynamic miss instances in nonnumeric applications, which is a difficult but important problem. Although compilers cannot statically analyze data locality in nonnumeric applications, one viable approach is to use profiling information to measure the actual miss behavior. Unfortunately, the state-of-the-art in cache miss profiling (which we call summary profiling) is inadequate for references with intermediate miss ratios-it either misses opportunities to hide latency, or else inserts overhead that is unnecessary. To overcome this problem, we propose and evaluate a new profiling technique that helps predict which dynamic instances of a static memory reference will hit or miss in the cache: correlation profiling Our experimental results demonstrate that roughly half of the 21 nonnumeric applications we study can potentially enjoy significant reductions in memory stall time by exploiting at least one of the three forms of correlation profiling we consider: control-flow correlation, self correlation, and global correlation. In addition, our detailed case studies illustrate that self correlation succeeds because a given reference's cache outcomes often contain repeated patterns and control-flow correlation succeeds because cache outcomes are often call-chain dependent. Finally, we suggest a number of ways to exploit correlation profiling in practice and demonstrate that software prefetching can achieve better performance on a modern superscalar processor when directed by correlation profiling rather than summary profiling information.
机译:延迟容忍技术为弥合内存子系统与当今高性能处理器之间不断扩大的速度差距提供了潜力。但是,要充分利用这些技术的优势,必须小心将它们仅应用于可能遭受高速缓存未命中的动态引用,否则运行时开销可能会抵消任何收益。在本文中,我们专注于隔离非数值应用程序中的动态缺失实例,这是一个困难但重要的问题。尽管编译器无法静态分析非数值应用程序中的数据局部性,但一种可行的方法是使用性能分析信息来衡量实际的未命中行为。不幸的是,最新的高速缓存未命中概要分析(我们称为摘要概要分析)不足以支持具有中等未命中率的引用,它会丢失隐藏延迟的机会,或者会插入不必要的开销。为了克服这个问题,我们提出并评估了一种新的分析技术,该技术可以帮助预测静态内存引用的哪些动态实例将在高速缓存中命中或丢失:相关性分析我们的实验结果表明,我们研究的21种非数字应用程序中大约有一半可以潜在地通过利用我们考虑的三种相关配置文件中的至少一种,可以显着减少内存停顿时间:控制流相关,自相关和全局相关。此外,我们的详细案例研究表明,自相关成功是因为给定引用的缓存结果通常包含重复的模式,而控制流相关成功是因为缓存结果通常依赖于调用链。最后,我们提出了许多在实践中利用相关概要分析的方法,并证明了在由相关概要分析而不是摘要概要分析信息指导的情况下,软件预取可以在现代超标量处理器上实现更好的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号