首页> 外文期刊>ACM Transactions on Programming Languages and Systems >The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests
【24h】

The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

机译:冥王星加算法:仿射环嵌套并行化和局部优化的实用方法

获取原文
获取原文并翻译 | 示例
       

摘要

Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks such as the Pluto algorithm, which include a cost function for modern multicore architectures for which coarse-grained parallelism and locality are crucial, consider only a subspace of transformations to avoid a combinatorial explosion in finding transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations: in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In addition, there is currently no proof that the algorithm successfully finds a tree of permutable loop bands for all affine loop nests. In this article, we propose an approach to address these two issues (1) by modeling a much larger space of practically useful affine transformations in conjunction with the existing cost function of Pluto, and (2) by extending the Pluto algorithm in a way that allows a proof for its soundness and completeness for all affine loop nests. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance for any benchmark from Polybench. For the Lattice Boltzmann Method (LBM) simulations with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compilation time significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time by only 15%. In cases in which it improves execution time significantly, it increased polyhedral optimization time by only 2.04x.
机译:仿射变换已被证明具有强大的建模能力,因为它们具有建模各种变换的能力。单个多维仿射函数可以表示一个较长且复杂的序列,其中包含更简单的转换。现有的仿射变换框架(如Pluto算法)包括现代多核体系结构的成本函数,对于这些模型,粗糙粒度的并行性和局部性至关重要,因此仅考虑变换的子空间,以避免在寻找变换时组合爆炸。随后的实际取舍导致排除了某些有用的转换:特别是涉及循环反转和负因素引起的循环倾斜的转换组合。另外,目前尚无证据表明该算法能成功找到所有仿射环路嵌套的可置换环路带树。在本文中,我们提出一种解决这两个问题的方法(1)通过结合实际的Pluto成本函数对更大范围的实际有用仿射变换进行建模,以及(2)通过扩展Pluto算法,可以证明其对所有仿射环嵌套的健全性和完整性。我们对编译时间的影响和所生成代码的性能都进行了实验评估。评估表明,对于Polybench的任何基准测试,我们的新框架Pluto +都不会降低性能。对于具有周期性边界条件的Lattice Boltzmann方法(LBM)模拟,它的平均速度比Pluto高1.33倍。我们还表明,Pluto +不会显着增加编译时间。在Polybench上进行的实验结果表明,Pluto +仅将整体多面体源间优化时间增加了15%。在显着缩短执行时间的情况下,它仅将多面体优化时间增加了2.04倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号