首页> 外文学位 >Efficient optimization of memory accesses in parallel programs.
【24h】

Efficient optimization of memory accesses in parallel programs.

机译:并行程序中内存访问的有效优化。

获取原文
获取原文并翻译 | 示例

摘要

The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges.;The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models.;The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible.;Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor.;With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures.
机译:功率,频率和内存壁问题通过引入包含多个低功耗内核的处理器,已导致主流计算发生了重大变化。随着多核处理器的普及,并行编程语言和动态编译中的软件趋势为多核处理器的程序编译增加了新的挑战。本文提出了高级和低级编译器优化的组合来解决这些挑战。本文引入的高级优化包括并行程序May-Happen-in-Parallel分析和副作用分析的新方法。以及用于消除负载转换的新颖的并行度感知标量替换。描述了一种新的隔离一致性(IC)内存模型,与许多现有内存模型相比,该模型允许进行多个标量替换转换机会;低级优化包括一种新颖的寄存器分配方法,该方法保留了线性扫描的编译时间和空间效率,同时提供优于线性扫描和图形着色的运行时性能。将分配阶段建模为Bipartite Liveness Graph(BLG)数据结构上的优化问题。分配阶段的重点是通过尽可能使用寄存器到寄存器的移动和交换指令来减少溢出指令的数量。;对Jikes RVM动态编译器中用于消除负载转换的标量替换的实验评估表明,getfield操作的动态计数减少了与Jikes RVM中可用的负载消除算法相比,它的性能高达99.99%,在1个内核上的性能提升高达1.76倍,在16个内核上的性能提升高达1.39x。 Jikes RVM中我们的BLG寄存器分配器的原型实现证明,与x86处理器上的线性扫描相比,运行时性能提高了3.52倍。与GCC编译器框架中的Graph Coloring寄存器分配器相比,我们的分配器使执行时间缩短了5.8%,在POWER5处理器上平均缩短了2.3%。本文认为,提出的高级和低级优化对于解决多核体系结构并行程序优化中出现的一些新挑战很有用。

著录项

  • 作者

    Barik, Rajkishore.;

  • 作者单位

    Rice University.;

  • 授予单位 Rice University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2010
  • 页码 206 p.
  • 总页数 206
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号