首页> 外文期刊>IEEE Transactions on Parallel and Distributed Systems >Compiler analysis for cache coherence: interprocedural array data-flow analysis and its impact on cache performance
【24h】

Compiler analysis for cache coherence: interprocedural array data-flow analysis and its impact on cache performance

机译:缓存一致性的编译器分析:过程间数组数据流分析及其对缓存性能的影响

获取原文
获取原文并翻译 | 示例

摘要

In this paper, we present compiler algorithms for detecting references to stale data in shared-memory multiprocessors. The algorithm consists of two key analysis techniques, state reference detection and locality preserving analysis. While the stale reference detection finds the memory reference patterns that may violate cache coherence, the locality preserving analysis minimizes the number of such stale references by analyzing both temporal and spatial reuses. By computing the regions referenced by arrays inside loops, we extend the previous scalar algorithms for more precise analysis. We develop a full interprocedural array data-flow algorithm, which performs both bottom-up side-effect analysis and top-down context analysis on the procedure call graph to further exploit locality across procedure boundaries. The interprocedural algorithm eliminates cache invalidations at procedure boundaries, which were assumed in the previous compiler algorithms. We have fully implemented the algorithm in the Polaris parallelizing compiler. Using execution-driven simulations on Perfect Club benchmarks, we demonstrate how unnecessary cache misses can be eliminated by the automatic stale reference detection. The algorithm can be used to implement cache coherence in the shared-memory multiprocessors that do not have hardware directories, such as Cray T3D.
机译:在本文中,我们提出了用于检测共享内存多处理器中过时数据引用的编译器算法。该算法包括两种关键分析技术:状态参考检测和位置保留分析。尽管过时的引用检测找到了可能违反高速缓存一致性的内存引用模式,但是局部性保留分析通过分析时间和空间重用,将此类过时的引用的数量最小化。通过计算循环内数组引用的区域,我们扩展了先前的标量算法,以进行更精确的分析。我们开发了一个完整的过程间数组数据流算法,该算法对过程调用图执行自下而上的副作用分析和自上而下的上下文分析,以进一步利用跨过程边界的局部性。过程间算法消除了在过程边界处的缓存无效,这在先前的编译器算法中已假定。我们已经在Polaris并行化编译器中完全实现了该算法。使用基于Perfect Club基准的执行驱动模拟,我们演示了如何通过自动过时参考检测来消除不必要的缓存未命中。该算法可用于在不具有硬件目录(例如Cray T3D)的共享内存多处理器中实现缓存一致性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号