首页> 外文OA文献 >Efficient optimization of memory accesses in parallel programs
【2h】

Efficient optimization of memory accesses in parallel programs

机译:高效优化并行程序中的内存访问

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The power, frequency, and memory wall problems have caused a major shift in mainstream computing by introducing processors that contain multiple low power cores. As multi-core processors are becoming ubiquitous, software trends in both parallel programming languages and dynamic compilation have added new challenges to program compilation for multi-core processors. This thesis proposes a combination of high-level and low-level compiler optimizations to address these challenges.The high-level optimizations introduced in this thesis include new approaches to May-Happen-in-Parallel analysis and Side-Effect analysis for parallel programs and a novel parallelism-aware Scalar Replacement for Load Elimination transformation. A new Isolation Consistency (IC) memory model is described that permits several scalar replacement transformation opportunities compared to many existing memory models.The low-level optimizations include a novel approach to register allocation that retains the compile time and space efficiency of Linear Scan, while delivering runtime performance superior to both Linear Scan and Graph Coloring. The allocation phase is modeled as an optimization problem on a Bipartite Liveness Graph (BLG) data structure. The assignment phase focuses on reducing the number of spill instructions by using register-to-register move and exchange instructions wherever possible.Experimental evaluations of our scalar replacement for load elimination transformation in the Jikes RVM dynamic compiler show decreases in dynamic counts for getfield operations of up to 99.99%, and performance improvements of up to 1.76x on 1 core, and 1.39x on 16 cores, when compared with the load elimination algorithm available in Jikes RVM. A prototype implementation of our BLG register allocator in Jikes RVM demonstrates runtime performance improvements of up to 3.52x relative to Linear Scan on an x86 processor. When compared to Graph Coloring register allocator in the GCC compiler framework, our allocator resulted in an execution time improvement of up to 5.8%, with an average improvement of 2.3% on a POWER5 processor.With the experimental evaluations combined with the foundations presented in this thesis, we believe that the proposed high-level and low-level optimizations are useful in addressing some of the new challenges emerging in the optimization of parallel programs for multi-core architectures.
机译:功率,频率和内存壁问题通过引入包含多个低功耗内核的处理器,已导致主流计算发生了重大变化。随着多核处理器的普及,并行编程语言和动态编译中的软件趋势为多核处理器的程序编译增加了新的挑战。本文提出了高级和低级编译器优化的组合来解决这些挑战。本文介绍的高级优化包括针对并行程序和并行程序的May-Happen-in-Parallel分析和Side-Effect分析的新方法。一种用于消除负载转换的新颖的具有并行性的标量替换。描述了一种新的隔离一致性(IC)内存模型,与许多现有内存模型相比,该模型允许进行多个标量替换转换机会。低级优化包括一种新颖的寄存器分配方法,该方法保留了线性扫描的编译时间和空间效率,而提供优于线性扫描和图形着色的运行时性能。将分配阶段建模为Bipartite Liveness Graph(BLG)数据结构上的优化问题。分配阶段着重于通过尽可能使用寄存器到寄存器的移动和交换指令来减少溢出指令的数量。在Jikes RVM动态编译器中对我们的标量替换进行负载消除转换的实验评估表明,getfield操作的动态计数减少了。与Jikes RVM中可用的负载消除算法相比,性能提高了99.99%,在1个内核上的性能提高了1.76倍,在16个内核上的性能提高了1.39x。 Jikes RVM中我们的BLG寄存器分配器的原型实现证明,与x86处理器上的线性扫描相比,运行时性能提高了3.52倍。与GCC编译器框架中的Graph Coloring寄存器分配器相比,我们的分配器使执行时间缩短了5.8%,在POWER5处理器上平均缩短了2.3%。通过实验评估并结合了本文介绍的基础因此,我们认为,建议的高级和低级优化对于解决多核体系结构并行程序优化中出现的一些新挑战很有用。

著录项

  • 作者

    Barik Rajkishore;

  • 作者单位
  • 年度 2010
  • 总页数
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类

相似文献

  • 外文文献
  • 中文文献
  • 专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号