...
首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism
【24h】

Improving balanced scheduling with compiler optimizations that increase instruction-level parallelism

机译:通过提高指令级并行度的编译器优化来改善均衡调度

获取原文
获取原文并翻译 | 示例

摘要

Traditional list schedulers order instructions based on an optimistic estimate of the load latency imposed by the hardware and therefore cannot respond to variations in memory latency caused by cache hits and misses on non-blocking architectures. In contrast, balanced scheduling schedules instructions based on an estimate of the amount of instruction-level parallelism in the program. By scheduling independent instructions behind loads based on what the program can provide, rather than what the implementation stipulates in the best case (i.e., a cache hit), balanced scheduling can hide variations in memory latencies more effectively.Since its success depends on the amount of instruction-level parallelism in the code, balanced scheduling should perform even better when more parallelism is available. In this study, we combine balanced scheduling with three compiler optimizations that increase instruction-level parallelism: loop unrolling, trace scheduling and cache locality analysis. Using code generated for the DEC Alpha by the Multiflow compiler, we simulated a non-blocking processor architecture that closely models the Alpha 21164. Our results show that balanced scheduling benefits from all three optimizations, producing average speedups that range from 1.15 to 1.40, across the optimizations. More importantly, because of its ability to tolerate variations in load interlocks, it improves its advantage over traditional scheduling. Without the optimizations, balanced scheduled code is, on average, 1.05 times faster than that generated by a traditional scheduler; with them, its lead increases to 1.18.
机译:传统的列表调度程序基于对由硬件施加的负载延迟的乐观估计来对指令进行排序,因此无法响应由非阻塞体系结构上的高速缓存命中和未命中导致的内存延迟的变化。相反,平衡调度基于对程序中指令级并行度的估计来调度指令。通过根据程序可以提供的内容而不是实现在最佳情况下(即高速缓存命中)规定的负载来调度负载后的独立指令,平衡的调度可以更有效地隐藏内存延迟的变化。由于其成功取决于数量考虑到代码中的指令级并行性,当有更多并行性可用时,均衡调度应表现得更好。在本研究中,我们将平衡调度与三种可提高指令级并行度的编译器优化相结合:循环展开,跟踪调度和缓存局部性分析。使用Multiflow编译器为DEC Alpha生成的代码,我们模拟了对Alpha 21164进行紧密建模的非阻塞处理器体系结构。我们的结果表明,均衡的调度得益于所有三个优化,在整个过程中产生的平均加速范围为1.15至1.40。优化。更重要的是,由于它具有承受负载互锁变化的能力,因此与传统的调度相比,它提高了优势。如果不进行优化,平衡的调度代码平均要比传统调度程序生成的代码快1.05倍;与他们,其领先优势增加到1.18。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号