Presents multi-chain prefetching, a technique that utilizesoffline analysis and a hardware prefetch engine to prefetch multipleindependent pointer chains simultaneously, thus exploiting inter-chainmemory parallelism for the purpose of memory latency tolerance. Thispaper makes three contributions. First, we introduce a schedulingalgorithm that identifies independent pointer chains in pointer-chasingcodes and computes a prefetch schedule that overlaps serialized cachemisses across separate chains. Our analysis focuses an statictraversals. We also propose using speculation to identify independentpointer chains in dynamic traversals. Second, we present the design of aprefetch engine that traverses pointer-based data structures andoverlaps multiple pointer chains according to our scheduling algorithm.Finally, we conduct an experimental evaluation of multi-chainprefetching and compare its performance against two existing techniques:jump pointer prefetching and prefetch arrays. Our results show thatmulti-chain prefetching improves the execution time by 40% across sixpointer-chasing kernels from the Olden benchmark suite and by 8% acrossfour SPECInt CPU2000 benchmarks. Multi-chain prefetching alsooutperforms jump pointer prefetching and prefetch arrays by 28% onOlden, and by 12% on SPECInt. Furthermore, speculation can enablemulti-chain prefetching for some dynamic traversal codes, but ourtechnique loses its effectiveness when the pointer-chain traversal orderis unpredictable. Finally, we also show that combining multi-chainprefetching with prefetch arrays can potentially provide higherperformance than either technique alone
展开▼