首页> 外文学位 >Compiler optimization of value communication for thread-level speculation.
【24h】

Compiler optimization of value communication for thread-level speculation.

机译:针对线程级投机的价值沟通的编译器优化。

获取原文
获取原文并翻译 | 示例

摘要

In the context of Thread-Level Speculation (TLS), inter-thread value communication is the key to efficient parallel execution. From the compiler's perspective, TLS supports two forms of inter-thread value communication: speculation and synchronization. Speculation allows for maximum parallel overlap when it succeeds, but becomes costly when it fails. Synchronization, on the other hand, introduces a fixed cost regardless of whether the dependence actually occurs or not. The fixed cost of synchronization is determined by the critical forwarding path, which is the time between when a thread first receives a value from its predecessor to when a new value is generated and forwarded to its successor. In the baseline implementation used in this dissertation, we synchronize all register-resident values and speculate on all memory-resident values. However, this naive approach yields little performance gain due to the excessive cost from inter-thread value communication. The goal of this dissertation is to develop compiler-based techniques to reduce the cost of inter-thread value communication and improve the overall program performance.; This dissertation proposes to use the compiler to orchestrate inter-thread value communication for both memory-resident and register-resident values. To improve the efficiency of inter-thread value communication, the compiler must first decide whether to synchronize or to speculate on a potential data dependence based on how frequently the dependence occurs. If synchronization is necessary, the compiler will then insert the corresponding signal and wait instructions, creating a point-to-point path to forward the values involved in the dependence. Because synchronization could serialize execution by stalling the consumer thread, we use the compiler to avoid such stalling by applying novel dataflow analyses to schedule instructions to shrink the critical forwarding path.; This dissertation reports the performance impact of several compiler-base value communication optimization techniques on a four-processor single-chip multiprocessor that has been extended to support thread-level speculation. Relative to the performance of the original sequential program executing on a single processor, for the set of loops selected to maximize program performance, parallel execution with the proposed baseline implementation results in 1% performance degradation for integer benchmarks and 21% performance improvement for floating point benchmarks, while with the optimization techniques we developed, parallel execution achieves 22% and 42% performance improvement for integer benchmarks and floating point benchmarks, respectively.
机译:在线程级推测(TLS)的上下文中,线程间值通信是有效并行执行的关键。从编译器的角度来看,TLS支持两种形式的线程间值通信:推测和同步。推测允许成功时最大程度的并行重叠,但是失败则代价高昂。另一方面,无论依赖关系是否实际发生,同步都会引入固定成本。同步的固定成本由关键转发路径确定,关键转发路径是从线程首次从其前任接收值到生成新值并将其转发给后继之间的时间。在本文使用的基准实现中,我们同步所有寄存器驻留值,并推测所有内存驻留值。但是,由于来自线程间值通信的过多成本,这种幼稚的方法几乎不会产生性能提升。本文的目的是开发基于编译器的技术,以减少线程间值通信的成本,并提高整体程序性能。本文提出使用编译器为内存驻留值和寄存器驻留值协调线程间值通信。为了提高线程间值通信的效率,编译器必须首先根据依赖关系发生的频率来决定是同步还是推测潜在的数据依赖关系。如果需要同步,则编译器将插入相应的信号并等待指令,从而创建点对点路径以转发依赖项中涉及的值。因为同步可以通过暂停使用者线程来序列化执行,所以我们使用编译器通过应用新颖的数据流分析来调度指令以缩短关键转发路径来避免这种暂停。本文报告了几种基于编译器的值通信优化技术对四处理器单芯片多处理器的性能影响,该技术已扩展为支持线程级推测。相对于在单个处理器上执行的原始顺序程序的性能,对于为最大程度提高程序性能而选择的一组循环,与建议的基准实现并行执行将导致整数基准性能下降1%,浮点性能下降21%基准,尽管我们开发了优化技术,但并行执行分别使整数基准和浮点基准的性能提高了22%和42%。

著录项

  • 作者

    Zhai, Antonia.;

  • 作者单位

    Carnegie Mellon University.;

  • 授予单位 Carnegie Mellon University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 164 p.
  • 总页数 164
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号