Data placement optimizations for multilevel cache hierarchies.

机译：针对多级缓存层次结构的数据放置优化。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

As compiler optimizations have increasingly focused on the memory hierarchy, a variety of efforts have attempted to reduce cache misses in first level instruction and data caches. Placement of code to reduce instruction cache misses, and placement of data to reduce data cache misses, have been demonstrated to be beneficial for a variety of application programs. However, most of this work has been limited to reduction of first-level cache misses. Careful examination of various characteristics of modern computer architectures reveals opportunities for a data placement optimization framework that targets several means of performance improvement at once. Cache hierarchies have recently extended as deep as three levels, each with different cache miss penalties. Cache misses need to be reduced at all cache levels to maximize performance. Reducing TLB (translation lookaside buffer) misses and virtual memory page use is also desirable. Addressing of global and local variables can use addressing modes of differing costs, and the less expensive addressing modes can be used more frequently if the data placement optimization considers this goal.; A multi-goal data placement framework has been developed to enable all of these optimizations. Through a novel method of static data affinity analysis, followed by a data placement optimization that uses hierarchical graph partitioning and local refinement, it is possible to achieve reductions in cache misses throughout the cache hierarchy, while also increasing page and TLB locality and enabling the address mode and bus cycle optimizations. An original method of characterizing the parameters of the cache and TLB hierarchy that are needed for the profiling and optimizations, using hardware performance counters, helps make the entire data placement framework practical and portable. The static data affinity analysis avoids the practical difficulties inherent in past research that relied on expensive dynamic profiling runs. The hierarchical graph partitioning approach to data placement is able to make use of Chaco, a well tested, off the shelf graph partitioning code library. Extensive measurements using timings and cache simulations for Sun UltraSparc-II machines demonstrate the effectiveness of the data placement optimizations.

机译：随着编译器优化越来越关注存储器层次结构，已进行了各种努力来减少第一级指令和数据高速缓存中的高速缓存未命中。减少代码缓存丢失的代码放置和减少数据缓存丢失的数据放置已被证明对各种应用程序都是有益的。但是，大部分工作仅限于减少一级缓存未命中。仔细检查现代计算机体系结构的各种特征后，我们发现了针对数据放置优化框架的机会，该框架可同时针对多种性能改进手段。高速缓存层次结构最近已扩展到三个级别，每个级别具有不同的高速缓存未命中罚款。需要在所有高速缓存级别上减少高速缓存未命中，以最大程度地提高性能。减少TLB（转换后备缓冲区）丢失和虚拟内存页面使用也是理想的。全局变量和局部变量的寻址可以使用成本不同的寻址模式，如果数据放置优化考虑到此目标，则可以更频繁地使用较便宜的寻址模式。已经开发了一个多目标数据放置框架来实现所有这些优化。通过一种新颖的静态数据亲和力分析方法，再通过使用分层图分区和局部优化的数据放置优化，可以在整个缓存层次结构中减少缓存未命中率，同时还可以增加页面和TLB的局部性并启用地址模式和总线周期优化。使用硬件性能计数器来表征分析和优化所需的缓存和TLB层次结构参数的原始方法，有助于使整个数据放置框架实用且可移植。静态数据亲和力分析避免了以往研究中固有的实际困难，后者依赖于昂贵的动态概要分析运行。用于数据放置的分层图分区方法能够利用Chaco（经过良好测试的现成图分区代码库）。使用Sun UltraSparc-II机器的时序和缓存模拟进行的大量测量证明了数据放置优化的有效性。

著录项

作者
Coleman, Clark L.;
展开▼
作者单位

University of Virginia.;

展开▼
授予单位 University of Virginia.;
学科 Computer Science.
学位 Ph.D.
年度 2004
页码 172 p.
总页数 172
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. On-Chip Caches Built on Multilevel Spin-Transfer Torque RAM Cells and Its Optimizations [J] . YIRAN CHEN, WENG-FAI WONG, HAI LI, ACM Journal on Emerging Technologies in Computing Systems . 2013,第2期

机译：基于多级自旋传递扭矩RAM单元的片上高速缓存及其优化
2. Cache Investment: Integrating Query Optimization and ?Distributed Data Placement [J] . DONALD KOSSMANN, MICHAEL J. FRANKLIN, GERHARD DRASCH ACM transactions on database systems . 2000,第4期

机译：缓存投资：集成查询优化和分布式数据放置
3. A graph theoretic approach to cache-conscious placement of data for direct mapped caches [J] . Beg Mirza, van Beek Peter ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 2010,第8期

机译：一种图形理论方法，用于直接映射高速缓存的高速缓存敏感数据放置
4. XRootd, disk-based, caching proxy for optimization of data access, data placement and data replication [C] . L A T Bauerdick, K Bloom, B Bockelman, Conference on Computing in High Energy and Nuclear Physics . 2014

机译：Xrootd，基于磁盘，缓存代理，用于优化数据访问，数据放置和数据复制
5. Cache Analysis and Techniques for Optimizing Data Movement Across the Cache Hierarchy for HPC Workloads [D] . Deshpande, Aditya Madhusudan. 2019

机译：用于优化HPC工作负载的缓存层次结构数据移动的缓存分析和技术
6. Strategies of data layout and cache writing for input-output optimization in high performance scientific computing: Applications to the forward electrocardiographic problem [O] . Louie Cardone-Noott, Blanca Rodriguez, Alfonso Bueno-Orovio 2012

机译：高性能科学计算中输入输出优化的数据布局和高速缓存写入策略：应用于正向心电图问题
7. Cache Investment: Integrating Query Optimization and Distributed Data Placement [O] . Donald Kossmann, Michael J. Franklin, Gerhard Drasch 2000

机译：缓存投资：集成查询优化和分布式数据放置

Data placement optimizations for multilevel cache hierarchies.

摘要

著录项

相似文献

相关主题

期刊订阅