Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

Jack L. Lo; Susan J. Eggers

首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

【24h】

Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

机译：通过提高指令级并行度的编译器优化来改善均衡调度

获取原文

获取原文并翻译 | 示例

获取外文期刊封面封底 >>

开具论文收录证明 >>

文献代查 >>

团队文献服务 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory latency caused by cache hits and misses on non-blocking architectures. In contrast, balanced scheduling schedules instructions based on an estimate of the amount of instruction-level parallelism in the program. By scheduling independent instructions behind loads based on what the program can provide, rather than what the implementation stipulates in the best case (i.e., a cache hit), balanced scheduling can hide variations in memory latencies more effectively.Since its success depends on the amount of instruction-level parallelism in the code, balanced scheduling should perform even better when more parallelism is available. In this study, we combine balanced scheduling with three compiler optimizations that increase instruction-level parallelism: loop unrolling, trace scheduling and cache locality analysis. Using code generated for the DEC Alpha by the Multiflow compiler, we simulated a non-blocking processor architecture that closely models the Alpha 21164. Our results show that balanced scheduling benefits from all three optimizations, producing average speedups that range from 1.15 to 1.40, across the optimizations. More importantly, because of its ability to tolerate variations in load interlocks, it improves its advantage over traditional scheduling. Without the optimizations, balanced scheduled code is, on average, 1.05 times faster than that generated by a traditional scheduler; with them, its lead increases to 1.18.

机译：传统的列表调度程序基于对由硬件施加的负载延迟的乐观估计来对指令进行排序，因此无法响应由非阻塞体系结构上的高速缓存命中和未命中导致的内存延迟的变化。相反，平衡调度基于对程序中指令级并行度的估计来调度指令。通过根据程序可以提供的内容而不是实现在最佳情况下（即高速缓存命中）规定的负载来调度负载后的独立指令，平衡的调度可以更有效地隐藏内存延迟的变化。由于其成功取决于数量考虑到代码中的指令级并行性，当有更多并行性可用时，均衡调度应表现得更好。在本研究中，我们将平衡调度与三种可提高指令级并行度的编译器优化相结合：循环展开，跟踪调度和缓存局部性分析。使用Multiflow编译器为DEC Alpha生成的代码，我们模拟了对Alpha 21164进行紧密建模的非阻塞处理器体系结构。我们的结果表明，均衡的调度得益于所有三个优化，在整个过程中产生的平均加速范围为1.15至1.40。优化。更重要的是，由于它具有承受负载互锁变化的能力，因此与传统的调度相比，它提高了优势。如果不进行优化，平衡的调度代码平均要比传统调度程序生成的代码快1.05倍;与他们，其领先优势增加到1.18。

著录项

来源
《ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages》 |1995年第6期|共12页
作者
Jack L. Lo; Susan J. Eggers;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism [J] . Jack L. Lo, Susan J. Eggers ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 1995,第6期

机译：通过提高指令级并行度的编译器优化来改善均衡调度
2. Space-Time Scheduling of Instruction-Level Parallelism on a Raw Machine [J] . Walter Lee, Rajeev Barua, Matthew Frank, ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages . 1998,第11期

机译：原始机器上指令级并行性的时空调度
3. Compiler-Assisted Leakage- and Temperature- Aware Instruction-Level VLIW Scheduling [J] . Cao S., Li Z., Wang F., IEEE transactions on very large scale integration (VLSI) systems . 2014,第6期

机译：编译器辅助的泄漏和温度感知指令级VLIW调度
4. Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism [C] . Jack L. Lo, Susan J. Eggers ACM SIGPLAN conference on Programming language design and implementation . 1995

机译：通过提高指令级并行度的编译器优化来改善均衡调度
5. Generating, Optimizing, and Scheduling a Compiler Level Representation of Stream Parallelism. [D] . Fifield, Jeffrey M. 2011

机译：生成，优化和调度流并行性的编译器级别表示。
6. Exploiting Thread-Level and Instruction-Level Parallelism to Cluster Mass Spectrometry Data using Multicore Architectures [O] . Fahad Saeed, Jason D. Hoffert, Trairak Pisitkun, -1

机译：利用多核体系结构利用线程级和指令级并行性对质谱数据进行聚类
7. Improving Balanced Scheduling with Compiler Optimizations that Increase Instruction-Level Parallelism [O] . Jack L. Lo, Susan J. Eggers 1995

机译：通过可提高指令级并行度的编译器优化来改善均衡调度

Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅