A Study of Source-Level Compiler Algorithms for Automatic Construction of Pre-Execution Code

DONGKEUN KIM; DONALD YEUNG

首页> 外文期刊>ACM transactions on computer systems >A Study of Source-Level Compiler Algorithms for Automatic Construction of Pre-Execution Code

【24h】

A Study of Source-Level Compiler Algorithms for Automatic Construction of Pre-Execution Code

机译：自动执行预执行代码的源代码级编译器算法研究

获取原文

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Pre-execution is a promising latency tolerance technique that uses one or more helper threads running in spare hardware contexts ahead of the main computation to trigger long-latency memory operations early, hence absorbing their latency on behalf of the main computation. This article investigates several source-to-source C compilers for extracting pre-execution thread code automatically, thus relieving the programmer or hardware from this onerous task. We present an aggressive profile-driven compiler that employs three powerful algorithms for code extraction. First, program slicing removes non-critical code for computing cache-missing memory references. Second, prefetch conversion replaces blocking memory references with non-blocking prefetch instructions to minimize pre-execution thread stalls. Finally, speculative loop parallelization generates thread-level parallelism to tolerate the latency of blocking loads. In addition, we present four "reduced" compilers that employ less aggressive algorithms to simplify compiler implementation. Our reduced compilers rely on back-end code optimizations rather than program slicing to remove non-critical code, and use compile-time heuristics rather than profiling to approximate runtime information (e.g., cache-miss and loop-trip counts). We prototype our algorithms on the Stanford University Intermediate Format (SUIF) framework and a publicly available program slicer, called Unravel [Lyle and Wallace 1997]. Using our prototype, we undertake a performance evaluation of our compilers on a detailed architectural simulator of an 8-way out-of-order SMT processor with 4 hardware contexts, and 13 applications selected from the SPEC and Olden benchmark suites. Our most aggressive compiler improves the performance of 10 out of 13 applications, reducing execution time by 20.9%. Across all 13 applications, our aggressive compiler achieves a harmonic average speedup of 17.6%. For our reduced compilers, eliminating program slicing and relying on back-end optimizations degrades performance minimally, suggesting that effective pre-execution compilers can be built without program slicing. Furthermore, without cache-miss profiles, we still achieve good speedup, 15.5%, but without loop-trip count profiles, we achieve a speedup of only 7.7%. Finally, our results show compiler-based pre-execution can benefit multiprogrammed workloads. Simultaneously executing applications achieve higher throughput with pre-execution compared to no pre-execution. Due to contention for hardware contexts, however, time-slicing outperforms simultaneous execution in some cases where individual applications make heavy use of pre-execution threads.

机译：预执行是一种有前途的等待时间容忍技术，它使用一个或多个在主计算之前在备用硬件上下文中运行的辅助线程来尽早触发长等待时间的内存操作，从而代表主计算吸收了它们的等待时间。本文研究了几种用于自动提取执行前线程代码的源到源C编译器，从而使程序员或硬件摆脱了繁重的工作。我们提出了一个积极的配置文件驱动的编译器，该编译器使用三种强大的算法进行代码提取。首先，程序切片会删除用于计算缺少缓存的内存引用的非关键代码。其次，预取转换用非阻塞的预取指令替换了阻塞的内存引用，以最大程度地减少预执行线程的停顿。最后，推测性循环并行化会生成线程级并行度，以容忍阻塞负载的延迟。另外，我们介绍了四个“精简”编译器，它们使用了较少攻击性的算法来简化编译器的实现。我们精简的编译器依靠后端代码优化而不是程序切片来删除非关键代码，并使用编译时启发式方法而不是通过剖析来近似运行时信息（例如，缓存丢失和循环次数）。我们在斯坦福大学中间格式（SUIF）框架和称为Unravel [Lyle and Wallace 1997]的公共程序切片器上对算法进行原型设计。使用我们的原型，我们在详细的架构仿真器上对编译器进行了性能评估，该仿真器具有8种无序SMT处理器，4种硬件环境以及从SPEC和Olden基准套件中选择的13种应用程序。我们最强大的编译器可提高13个应用程序中10个的性能，从而将执行时间减少20.9％。在所有13种应用中，我们积极的编译器均实现了17.6％的谐波平均加速。对于我们精简的编译器，消除程序切片并依靠后端优化会最小程度地降低性能，这表明可以在不进行程序切片的情况下构建有效的预执行编译器。此外，如果没有缓存未命中配置文件，我们仍然可以达到15.5％的良好加速比，但是如果没有循环次数计数配置文件，我们只能实现7.7％的加速比。最后，我们的结果表明，基于编译器的预执行可以使多程序工作负载受益。与没有预执行相比，通过预执行同时执行的应用程序可实现更高的吞吐量。但是，由于争用硬件上下文，在某些情况下单个应用程序大量使用了预执行线程时，时间片的性能优于同时执行。

著录项

来源
《ACM transactions on computer systems》 |2004年第3期|p.326-379|共54页
作者
DONGKEUN KIM; DONALD YEUNG;
展开▼
作者单位

University of Maryland at College Park, Department of Electrical and Computer Engineering, Institute for Advanced Computer Studies, College Park, MD 20742;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词
data prefetching; multithreading; program slicing; speculative loop parallelization; prefetch conversion; memory-level parallelism; pre-execution;

机译：数据预取;多线程;程序切片;推测性循环并行化;预取转换;内存级并行度;预执行;

相似文献

外文文献
中文文献
专利

1. Real/binary-like coded versus binary coded genetic algorithms to automatically generate fuzzy knowledge bases: a comparative study [J] . Sofiane Achiche, Luc Baron, Marek Balazinski Engineering Applications of Artificial Intelligence . 2004,第4期

机译：实数/二进制编码与二进制编码遗传算法自动生成模糊知识库的比较研究
2. Speculative pre-execution assisted by compiler (SPEAR) [J] . Ro WW, Gaudiot JL Journal of Parallel and Distributed Computing . 2006,第8期

机译：编译器（SPEAR）辅助的推测性预执行
3. Exploiting domain-specific properties: compiling parallel dynamic neural network algorithms into efficient code [J] . Prechelt L. IEEE Transactions on Parallel and Distributed Systems . 1999,第11期

机译：利用特定领域的属性：将并行动态神经网络算法编译为高效代码
4. Graphics-Matching Algorithm Study and Packing Control with NIGA in the Hull Construction Automatic Packing System [C] . Mei Ying, Zhu Liangsheng, Y. E. Jiawei International Conference on Information Technology for Manufacturing Systems . 2010

机译：船体施工自动包装系统中NIGA的图形匹配算法研究与包装控制
5. Constructions, Analyses and Decoding Algorithms of LDPC codes and Error Control Codes for Flash Coding. [D] . Huang, Qin. 2011

机译：用于闪存编码的LDPC码和差错控制码的构造，分析和解码算法。
6. A Radio-Map Automatic Construction Algorithm Based on Crowdsourcing [O] . Ning Yu, Chenxian Xiao, Yinfeng Wu, 2016

机译：基于众包的无线电地图自动构建算法
7. A Study of Source-Level Compiler Algorithms for Automatic Construction of Pre-execution Code [O] . Dongkeun Kim, Donald Yeung 2015

机译：预执行代码自动构建的源级编译器算法研究
8. Study and Implementation of an Automatically Retargeting Microcode Compiler System [R] . Decker, S. L. 1986

机译：自动重定向微码编译器系统的研究与实现

A Study of Source-Level Compiler Algorithms for Automatic Construction of Pre-Execution Code

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅