Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

MENG-JU WU; DONALD YEUNG

首页> 外文期刊>ACM transactions on computer systems >Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

【24h】

Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

机译：基于循环的并行程序的多核扩展的有效重用距离分析

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Reuse Distance (RD) analysis is a powerful memory analysis tool that can potentially help architects study multicore processor scaling. One key obstacle, however, is that multicore RD analysis requires measuring Concurrent Reuse Distance (CRD) and Private-LRU-stack Reuse Distance (PRD) profiles across thread-interleaved memory reference streams. Sensitivity to memory interleaving makes CRD and PRD profiles architecture dependent, preventing them from analyzing different processor configurations. For loop-based parallel programs, CRD and PRD profiles shift coherently across RD values with core count scaling because interleaving threads are symmetric. Simple techniques can predict such shifting, making the analysis of numerous multicore configurations from a small set of CRD and PRD profiles feasible. Given the ubiquity of parallel loops, such techniques will be extremely valuable for studying future large multicore designs. This article investigates using RD analysis to efficiently analyze multicore cache performance for loop-based parallel programs, making several contributions. First, we provide an in-depth analysis on how CRD and PRD profiles change with core count scaling. Second, we develop techniques to predict CRD and PRD profile scaling, in particular employing reference groups [Zhong et al. 2003] to predict coherent shift, demonstrating 90% or greater prediction accuracy. Third, our CRD and PRD profile analyses define two application parameters with architectural implications: C_(core) is the minimum shared cache capacity that "contains" locality degradation due to core count scaling, and C_(share) is the capacity at which shared caches begin to provide a cache-miss reduction compared to private caches. And fourth, we apply CRD and PRD profiles to analyze multicore cache performance. When combined with existing problem scaling prediction, our techniques can predict shared LLC MPKI (private L2 cache MPKI) to within 10.7% (13.9%) of simulation across 1,728 (1,440) configurations using only 36 measured CRD (PRD) profiles.

机译：重用距离（RD）分析是功能强大的内存分析工具，可以潜在地帮助架构师研究多核处理器扩展。但是，一个主要的障碍是，多核RD分析需要跨线程交错的内存参考流测量并发重用距离（CRD）和私有LRU堆栈重用距离（PRD）配置文件。对内存交错的敏感性使CRD和PRD配置文件依赖于体系结构，从而阻止了它们分析不同的处理器配置。对于基于循环的并行程序，因为交织线程是对称的，所以CRD和PRD配置文件在RD值上具有核心计数比例一致地移动。简单的技术可以预测这种变化，从而使从一小组CRD和PRD配置文件中分析众多多核配置变得可行。考虑到并行循环的普遍性，这种技术对于研究未来的大型多核设计将非常有价值。本文研究了使用RD分析来有效分析基于循环的并行程序的多核缓存性能，并做出了一些贡献。首先，我们对CRD和PRD配置文件如何随着核心数量缩放而变化进行了深入分析。其次，我们开发了预测CRD和PRD轮廓缩放的技术，特别是采用了参考组[Zhong等。 [2003年]预测相干位移，表明预测准确率达到90％或更高。第三，我们的CRD和PRD配置文件分析定义了两个具有体系结构含义的应用程序参数：C_（core）是最小的共享缓存容量，其“包含”由于核心数量缩放而导致的局部性降低； C_（share）是共享缓存的容量与专用缓存相比，开始减少缓存丢失。第四，我们使用CRD和PRD配置文件来分析多核缓存性能。当与现有问题规模预测结合使用时，我们的技术可以仅使用36个测量的CRD（PRD）配置文件来预测1,728（1,440）个配置中的共享LLC MPKI（专用L2缓存MPKI）在模拟的10.7％（13.9％）之内。

著录项

来源
《ACM transactions on computer systems》 |2013年第1期|1.1-1.37|共37页
作者
MENG-JU WU; DONALD YEUNG;
展开▼
作者单位

University of Maryland at College Park, MD;

University of Maryland at College Park, MD;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类
关键词
cache performance; reuse distance; chip multiprocessors;

机译：缓存性能;重用距离芯片多处理器;

相似文献

外文文献
中文文献
专利

1. Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis [J] . Badamo Michael, Casarona Jeff, Zhao Minshu, ACM transactions on computer systems . 2016,第1期

机译：通过重用距离分析识别高能效的多核缓存层次结构
2. Studying Multicore Processor Scaling via Reuse Distance Analysis [J] . Meng-Ju Wu, Minshu Zhao, Donald Yeung Computer architecture news . 2013,第3期

机译：通过重用距离分析研究多核处理器扩展
3. Parallelization and scalability analysis of inverse factorization using the chunks and tasks programming model [J] . Artemov Anton G., Rudberg Elias, Rubensson Emanuel H. Parallel Computing . 2019,第Nova期

机译：使用块和任务编程模型进行逆分解的并行化和可伸缩性分析
4. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs [C] . Wu Meng-Ju, Yeung Donald 2011 International Conference on Parallel Architectures and Compilation Techniques . 2011

机译：相干配置文件：为基于循环的并行程序启用多核扩展的有效重用距离分析
5. Studying the Impact of Multicore Processor Scaling on Cache Coherence Directories via Reuse Distance Analysis [D] . Zhao, Minshu 2015

机译：通过重用距离分析研究多核处理器缩放对高速缓存相干目录的影响
6. A parallel and sensitive software tool for methylation analysis on multicore platforms [O] . Joaquín Tárraga, Mariano Pérez, Juan M. Orduña, -1

机译：用于多核平台上甲基化分析的并行且敏感的软件工具
7. Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs [O] . 2013

机译：基于环路的并行程序多核尺度的高效重用距离分析
8. Parallel Large-scale Semidefinite Programming for Strong Electron Correlation: Using Correlation and Entanglement in the Design of Efficient Energy-Transfer Mechanisms. [R] . Mazziotti, D. A. 2014

机译：强电子相关的并行大规模半定规划：高效能量传递机构设计中的相关和纠缠。

Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅