A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-chain Prefetching

SEUNGRYUL CHOI; NICHOLAS KOHOUT; SUMIT PAMNANI; DONGKEUN KIM; DONALD YEUNG

首页> 外文期刊>ACM transactions on computer systems >A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-chain Prefetching

【24h】

A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-chain Prefetching

机译：链接数据结构中预取调度的通用框架及其在多链预取中的应用

获取原文

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Pointer-chasing applications tend to traverse composite data structures consisting of multiple independent pointer chains. While the traversal of any single pointer chain leads to the serialization of memory operations, the traversal of independent pointer chains provides a source of memory parallelism. This article investigates exploiting such interchain memory parallelism for the purpose of memory latency tolerance, using a technique called multi-chain prefetching. Previous works [Roth et al. 1998; Roth and Sohi 1999] have proposed prefetching simple pointer-based structures in a multi-chain fashion. However, our work enables multi-chain prefetching for arbitrary data structures composed of lists, trees, and arrays. This article makes five contributions in the context of multi-chain prefetching. First, we introduce a framework for compactly describing linked data structure (LDS) traversals, providing the data layout and traversal code work information necessary for prefetching. Second, we present an off-line scheduling algorithm for computing a prefetch schedule from the LDS descriptors that overlaps serialized cache misses across separate pointer-chain traversals. Our analysis focuses on static traversals. We also propose using speculation to identify independent pointer chains in dynamic traversals. Third, we propose a hardware prefetch engine that traverses pointer-based data structures and overlaps multiple pointer chains according to the computed prefetch schedule. Fourth, we present a compiler that extracts LDS descriptors via static analysis of the application source code, thus automating multi-chain prefetching. Finally, we conduct an experimental evaluation of compiler-instrumented multi-chain prefetching and compare it against jump pointer prefetching [Luk and Mowry 1996], prefetch arrays [Karlsson et al. 2000], and predictor-directed stream buffers (PSB) [Sherwood et al. 2000]. Our results show compiler-instrumented multi-chain prefetching improves execution time by 40% across six pointer-chasing kernels from the Olden benchmark suite [Rogers et al. 1995], and by 3% across four SPECint2000 benchmarks. Compared to jump pointer prefetching and prefetch arrays, multi-chain prefetching achieves 34% and 11% higher performance for the selected Olden and SPECint2000 benchmarks, respectively. Compared to PSB, multi-chain prefetching achieves 27% higher performance for the selected Olden benchmarks, but PSB outperforms multi-chain prefetching by 0.2% for the selected SPECint2000 benchmarks. An ideal PSB with an infinite Markov predictor achieves comparable performance to multi-chain prefetching, coming within 6% across all benchmarks. Finally, speculation can enable multi-chain prefetching for some dynamic traversal codes, but our technique loses its effectiveness when the pointer-chain traversal order is highly dynamic.

机译：指针跟踪应用程序倾向于遍历由多个独立的指针链组成的复合数据结构。尽管任何单个指针链的遍历都会导致内存操作的序列化，但是独立指针链的遍历提供了内存并行性的来源。本文研究了一种使用多链预取的技术，以利用这种链间内存并行性来实现内存延迟延迟。以前的作品[Roth等。 1998年； Roth and Sohi 1999]提出了以多链方式预取基于指针的简单结构的建议。但是，我们的工作允许对由列表，树和数组组成的任意数据结构进行多链预取。本文在多链预取的背景下做出了五点贡献。首先，我们引入一个框架来紧凑地描述链接数据结构（LDS）遍历，提供预取所需的数据布局和遍历代码工作信息。其次，我们提出了一种离线调度算法，用于从LDS描述符计算预取调度，该预取调度与跨独立指针链遍历的序列化缓存未命中重叠。我们的分析集中于静态遍历。我们还建议使用推测来确定动态遍历中的独立指针链。第三，我们提出了一种硬件预取引擎，该引擎可以遍历基于指针的数据结构，并根据计算出的预取时间表来重叠多个指针链。第四，我们介绍了一个通过对应用程序源代码进行静态分析来提取LDS描述符的编译器，从而实现了多链预取的自动化。最后，我们对编译器指令的多链预取进行了实验评估，并将其与跳转指针预取[Luk and Mowry 1996]，预取数组[Karlsson等。 [2000]，以及预测变量控制的流缓冲区（PSB）[Sherwood et al。 2000]。我们的结果表明，在Olden基准测试套件的六个指针跟踪内核中，编译器支持的多链预取将执行时间缩短了40％[Rogers等。（1995年），在四个SPECint2000基准测试中下降了3％。与跳转指针预取和预取数组相比，对于选定的Olden和SPECint2000基准，多链预取分别实现了34％和11％的更高性能。与PSB相比，在选定的Olden基准测试中，多链预取性能提高了27％，但是在选定的SPECint2000基准中，PSB的性能比多链预取性能高0.2％。具有无限马尔可夫预测因子的理想PSB可以达到与多链预取相当的性能，在所有基准测试中均在6％以内。最后，推测可以为某些动态遍历代码启用多链预取，但是当指针链遍历顺序高度动态时，我们的技术将失去其有效性。

著录项

来源
《ACM transactions on computer systems》 |2004年第2期|p.214-280|共67页
作者
SEUNGRYUL CHOI; NICHOLAS KOHOUT; SUMIT PAMNANI; DONGKEUN KIM; DONALD YEUNG;
展开▼
作者单位

Department of Computer Science, University of Maryland, College Park, MD 20742;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
data prefetching; memory parallelism; pointer-chasing code;

机译：数据预取;内存并行性;指针跟踪代码;

相似文献

外文文献
中文文献
专利

1. Estimating Effective Prefetch Distance in Threaded Prefetching for Linked Data Structures [J] . Yan Huang, Zhi-Min Gu, Jie Tang, International journal of parallel programming . 2012,第5期

机译：估计链接数据结构的线程预取中的有效预取距离
2. The Performance Optimization of Threaded Prefetching for Linked Data Structures [J] . Yan Huang, Jie Tang, Zhi-min Gu, International journal of parallel programming . 2012,第2期

机译：链接数据结构的线程预取性能优化
3. Software prefetching using jump pointers in linked data structures [J] . ARUSHI ARORA, SWATI PRIYA, AKHIL KHARE Oriental journal of computer science and technology . 2010,第1期

机译：在链接的数据结构中使用跳转指针进行软件预取
4. Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems [C] . Eiman Ebrahimi, Onur Mutlu, Yale N. Patt International Symposium on High Performance Computer Architecture . 2009

机译：混合预取系统中的带宽数据结构的带宽预取技术
5. Accurate, timely data prefetching for regular stream, linked data structure, and correlated miss pattern [D] . Liu, Gang 2010

机译：准确，及时地预取常规数据流，链接的数据结构以及相关的未命中模式
6. Molecule database framework: a framework for creating database applications with chemical structure search capability [O] . Joos Kiener 2013

机译：分子数据库框架：用于创建具有化学结构搜索功能的数据库应用程序的框架
7. Techniques for Bandwidth-Efficient Prefetching of Linked Data Structures in Hybrid Prefetching Systems [O] . Eiman Ebrahimi, Onur Mutlu, Yale N. Patt 2008

机译：混合预取系统中链接数据结构的带宽有效预取技术

A General Framework for Prefetch Scheduling in Linked Data Structures and Its Application to Multi-chain Prefetching

摘要

著录项

相似文献

相关主题

期刊订阅