【24h】

Rapid Memory Footprint Access Diagnostics

机译:快速内存足迹访问诊断

获取原文

摘要

Footprint and reuse distance measure temporal locality and therefore do not capture the significance of access patterns (spacial locality). A strided access pattern has the largest possible footprint but usually has the best performance. To highlight exposed memory latency, we separate footprint into strided (prefetchable) and irregular (non-prefetchable) access components and calculate the growth rate of each. To rapidly compute these footprint access diagnostics, we present two methods, whole-program and precise. Current footprint analyses can cause 200× or more slowdown with realistic inputs and are therefore impractical. Our whole-program method reduces the overhead to 10% by computing upper bounds, but still yields inter-procedural insight through a call path profile. Our precise method uses additional static analysis and profiling to refine the upper bounds for intra-procedural loop nests. We evaluate our approaches using benchmarks that vary access patterns (strided vs. unpredictable), sparsity (all words in a cache line vs. some), and reuse (varying and repeated accesses per element). Notably, for loop nests with unpredictable accesses, the precise method's accuracy is within 10% of ideal. The whole-program method has sufficient accuracy to diagnose bottlenecks.
机译:足迹和重用距离测量时间局部性,因此没有捕获访问模式的重要性(空间局部性)。跨接访问模式具有最大可能的占用空间,但通常具有最佳性能。为了突出显示的内存延迟,我们将占用空间分为跨步(可预取)和不规则(不可预取)访问组件,并计算每个组件的增长率。为了快速计算这些足迹访问诊断,我们提出了两种方法,即整个程序和精确方法。当前的足迹分析在实际输入下可能导致速度降低200倍或更多,因此不切实际。我们的整个程序方法通过计算上限将开销减少到10%,但仍然可以通过调用路径配置文件获得过程间的见解。我们的精确方法使用其他静态分析和性能分析来完善过程内循环嵌套的上限。我们使用基准测试来评估我们的方法,这些基准测试会改变访问模式(随机或不可预测),稀疏性(高速缓存行中的所有单词或某些单词)和重用(每个元素重复和多次访问)。值得注意的是,对于具有不可预测的访问的循环嵌套,精确方法的精度在理想值的10%以内。整个程序方法具有足够的准确性来诊断瓶颈。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号