Efficient optimization of memory accesses in parallel programs.

机译：并行程序中内存访问的有效优化。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges.;The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models.;The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible.;Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor.;With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures.

机译：功率，频率和内存壁问题通过引入包含多个低功耗内核的处理器，已导致主流计算发生了重大变化。随着多核处理器的普及，并行编程语言和动态编译中的软件趋势为多核处理器的程序编译增加了新的挑战。本文提出了高级和低级编译器优化的组合来解决这些挑战。本文引入的高级优化包括并行程序May-Happen-in-Parallel分析和副作用分析的新方法。以及用于消除负载转换的新颖的并行度感知标量替换。描述了一种新的隔离一致性（IC）内存模型，与许多现有内存模型相比，该模型允许进行多个标量替换转换机会;低级优化包括一种新颖的寄存器分配方法，该方法保留了线性扫描的编译时间和空间效率，同时提供优于线性扫描和图形着色的运行时性能。将分配阶段建模为Bipartite Liveness Graph（BLG）数据结构上的优化问题。分配阶段的重点是通过尽可能使用寄存器到寄存器的移动和交换指令来减少溢出指令的数量。;对Jikes RVM动态编译器中用于消除负载转换的标量替换的实验评估表明，getfield操作的动态计数减少了与Jikes RVM中可用的负载消除算法相比，它的性能高达99.99％，在1个内核上的性能提升高达1.76倍，在16个内核上的性能提升高达1.39x。 Jikes RVM中我们的BLG寄存器分配器的原型实现证明，与x86处理器上的线性扫描相比，运行时性能提高了3.52倍。与GCC编译器框架中的Graph Coloring寄存器分配器相比，我们的分配器使执行时间缩短了5.8％，在POWER5处理器上平均缩短了2.3％。本文认为，提出的高级和低级优化对于解决多核体系结构并行程序优化中出现的一些新挑战很有用。

著录项

作者
Barik, Rajkishore.;
展开▼
作者单位

Rice University.;

展开▼
授予单位 Rice University.;
学科 Computer Science.
学位 Ph.D.
年度 2010
页码 206 p.
总页数 206
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. A methodology correlating code optimizations with data memory accesses, execution time and energy consumption [J] . Kelefouras Vasilios, Djemame Karim Journal of supercomputing . 2019,第10期

机译：一种将代码优化与数据存储器访问，执行时间和能耗相关联的方法
2. Evaluating optimizations that reduce globalmemory accesses of stencil computations in GPGPUs [J] . Thiago Carrijo Nasciutti, Jairo Panetta, Pedro Pais Lopes Concurrency, practice and experience . 2019,第18期

机译：评估减少GPGPU中模板计算的全局内存访问的优化
3. Evaluating optimizations that reduce globalmemory accesses of stencil computations in GPGPUs [J] . Thiago Carrijo Nasciutti, Jairo Panetta, Pedro Pais Lopes Concurrency, practice and experience . 2019,第18期

机译：评估减少GPGPU中的模板计算的GlobalMemory访问的优化
4. Discovering closed frequent itemsets on multicore: Parallelizing computations and optimizing memory accesses [C] . Negrevergne Benjamin, Termier Alexandre, Mehaut Jean-Francois, 2010 International Conference on High Performance Computing and Simulation . 2010

机译：发现多核上频繁关闭的项目集：并行化计算并优化内存访问
5. Enabling Efficient Parallelism for Applications with Dependences and Irregular Memory Accesses [D] . Jiang, Peng. 2019

机译：为具有依赖性和不规则内存访问的应用程序启用有效的并行性
6. The Origin of Biased Sequence Depth in Sequence-Independent Nucleic Acid Amplification and Optimization for Efficient Massive Parallel Sequencing [O] . Toon Rosseel, Steven Van Borm, Frank Vandenbussche, -1

机译：有效的大规模并行测序中与序列无关的核酸扩增和优化中偏倚序列深度的起源
7. Efficient optimization of memory accesses in parallel programs [O] . Barik Rajkishore 2010

机译：高效优化并行程序中的内存访问
8. Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses. [R] . Feng, H. Y., VanderWijngaart, R., Biswas, R. 2001

机译：具有不规则动态存储器访问的非结构化自适应（Ua）Nas并行基准设计。

Efficient optimization of memory accesses in parallel programs.

摘要

著录项

相似文献

相关主题

期刊订阅