首页> 外文会议>International conference on antennas and propagation;ICAP 89 >Multi-chain prefetching: effective exploitation of inter-chainmemory parallelism for pointer-chasing codes
【24h】

Multi-chain prefetching: effective exploitation of inter-chainmemory parallelism for pointer-chasing codes

机译:多链预取:有效利用链间指针跟踪代码的存储器并行性

获取原文

摘要

Presents multi-chain prefetching, a technique that utilizesoffline analysis and a hardware prefetch engine to prefetch multipleindependent pointer chains simultaneously, thus exploiting inter-chainmemory parallelism for the purpose of memory latency tolerance. Thispaper makes three contributions. First, we introduce a schedulingalgorithm that identifies independent pointer chains in pointer-chasingcodes and computes a prefetch schedule that overlaps serialized cachemisses across separate chains. Our analysis focuses an statictraversals. We also propose using speculation to identify independentpointer chains in dynamic traversals. Second, we present the design of aprefetch engine that traverses pointer-based data structures andoverlaps multiple pointer chains according to our scheduling algorithm.Finally, we conduct an experimental evaluation of multi-chainprefetching and compare its performance against two existing techniques:jump pointer prefetching and prefetch arrays. Our results show thatmulti-chain prefetching improves the execution time by 40% across sixpointer-chasing kernels from the Olden benchmark suite and by 8% acrossfour SPECInt CPU2000 benchmarks. Multi-chain prefetching alsooutperforms jump pointer prefetching and prefetch arrays by 28% onOlden, and by 12% on SPECInt. Furthermore, speculation can enablemulti-chain prefetching for some dynamic traversal codes, but ourtechnique loses its effectiveness when the pointer-chain traversal orderis unpredictable. Finally, we also show that combining multi-chainprefetching with prefetch arrays can potentially provide higherperformance than either technique alone
机译:介绍多链预取,该技术利用了 离线分析和硬件预取引擎以预取多个 同时独立的指针链,从而利用链间 出于内存等待时间容忍的目的,内存并行性。这 论文做出了三点贡献。首先,我们介绍一个排程 在指针追逐中识别独立指针链的算法 编码并计算与序列化缓存重叠的预取时间表 错过了各个独立的链条。我们的分析重点是静态 遍历。我们还建议使用推测来确定独立的 动态遍历中的指针链。其次,我们提出一个 遍历基于指针的数据结构的预取引擎,以及 根据我们的调度算法,重叠了多个指针链。 最后,我们对多链进行了实验评估 预取并将其性能与两种现有技术进行比较: 跳转指针预取和预取数组。我们的结果表明 多链预取使六个执行时间缩短了40% Olden基准套件中的指针追逐内核,跨8% 四个SPECInt CPU2000基准测试。多链预取 的性能比跳转指针预取和预取数组高28% 变旧,在SPECInt上变12%。此外,投机可以使 某些动态遍历代码的多链预取,但是我们的 指针链遍历顺序时,技巧会失去效力 是不可预测的。最后,我们还展示了结合多链 使用预取阵列进行预取可以潜在地提供更高的 比任何一种技术都有更好的表现

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号