...
首页> 外文期刊>ACM SIGPLAN Notices: A Monthly Publication of the Special Interest Group on Programming Languages >PLUTO plus : Near-Complete Modeling of Affine Transformations for Parallelism and Locality
【24h】

PLUTO plus : Near-Complete Modeling of Affine Transformations for Parallelism and Locality

机译:PLUTO plus:用于并行性和局部性的仿射变换的几乎完全建模

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Affine transformations have proven to be very powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multi-dimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks like the Pluto algorithm, that include a cost function for modern multicore architectures where coarse-grained parallelism and locality are crucial, consider only a sub-space of transformations to avoid a combinatorial explosion in finding the transformations. The ensuing practical tradeoffs lead to the exclusion of certain useful transformations, in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In this paper, we propose an approach to address this limitation by modeling a much larger space of affine transformations in conjunction with the Pluto algorithm's cost function. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance in any of the Polybench benchmarks. For Lattice Boltzmann Method (LBM) codes with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compile times significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time only by 15%. In cases where it improves execution time significantly, it increased polyhedral optimization time only by 2.04x.
机译:仿射变换由于能够建模非常广泛的变换而被证明对循环重组非常强大。单个多维仿射函数可以表示一个较长且复杂的序列,其中包含更简单的转换。现有的仿射变换框架(包括Pluto算法)包括现代多核体系结构的成本函数,在现代多核体系结构中,粗粒度的并行性和局部性至关重要,因此仅考虑变换的子空间,以避免在寻找变换时组合爆炸。随后的实际取舍导致排除了某些有用的转换,尤其是涉及由负因素导致的回路反转和回路倾斜的转换组合。在本文中,我们提出了一种方法,可以通过结合Pluto算法的成本函数对更大的仿射变换空间进行建模来解决此限制。我们对编译时间的影响和所生成代码的性能都进行了实验评估。评估表明,我们的新框架Pluto +在任何Polybench基准测试中均未降低性能。对于具有周期性边界条件的Lattice Boltzmann方法(LBM)代码,其平均速度比Pluto高1.33倍。我们还表明,Pluto +不会显着增加编译时间。在Polybench上的实验结果表明,Pluto +仅将整体多面体源间优化时间增加了15%。如果它显着缩短了执行时间,则多面体优化时间只会增加2.04倍。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号