The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

Bondhugula Uday; Acharya Aravind; Cohen Albert

首页> 外文期刊>ACM Transactions on Programming Languages and Systems >The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

【24h】

The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

机译：冥王星加算法：仿射环嵌套并行化和局部优化的实用方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Affine transformations have proven to be powerful for loop restructuring due to their ability to model a very wide range of transformations. A single multidimensional affine function can represent a long and complex sequence of simpler transformations. Existing affine transformation frameworks such as the Pluto algorithm, which include a cost function for modern multicore architectures for which coarse-grained parallelism and locality are crucial, consider only a subspace of transformations to avoid a combinatorial explosion in finding transformations. The ensuing practical trade-offs lead to the exclusion of certain useful transformations: in particular, transformation compositions involving loop reversals and loop skewing by negative factors. In addition, there is currently no proof that the algorithm successfully finds a tree of permutable loop bands for all affine loop nests. In this article, we propose an approach to address these two issues (1) by modeling a much larger space of practically useful affine transformations in conjunction with the existing cost function of Pluto, and (2) by extending the Pluto algorithm in a way that allows a proof for its soundness and completeness for all affine loop nests. We perform an experimental evaluation of both, the effect on compilation time, and performance of generated codes. The evaluation shows that our new framework, Pluto+, provides no degradation in performance for any benchmark from Polybench. For the Lattice Boltzmann Method (LBM) simulations with periodic boundary conditions, it provides a mean speedup of 1.33x over Pluto. We also show that Pluto+ does not increase compilation time significantly. Experimental results on Polybench show that Pluto+ increases overall polyhedral source-to-source optimization time by only 15%. In cases in which it improves execution time significantly, it increased polyhedral optimization time by only 2.04x.

机译：仿射变换已被证明具有强大的建模能力，因为它们具有建模各种变换的能力。单个多维仿射函数可以表示一个较长且复杂的序列，其中包含更简单的转换。现有的仿射变换框架（如Pluto算法）包括现代多核体系结构的成本函数，对于这些模型，粗糙粒度的并行性和局部性至关重要，因此仅考虑变换的子空间，以避免在寻找变换时组合爆炸。随后的实际取舍导致排除了某些有用的转换：特别是涉及循环反转和负因素引起的循环倾斜的转换组合。另外，目前尚无证据表明该算法能成功找到所有仿射环路嵌套的可置换环路带树。在本文中，我们提出一种解决这两个问题的方法（1）通过结合实际的Pluto成本函数对更大范围的实际有用仿射变换进行建模，以及（2）通过扩展Pluto算法，可以证明其对所有仿射环嵌套的健全性和完整性。我们对编译时间的影响和所生成代码的性能都进行了实验评估。评估表明，对于Polybench的任何基准测试，我们的新框架Pluto +都不会降低性能。对于具有周期性边界条件的Lattice Boltzmann方法（LBM）模拟，它的平均速度比Pluto高1.33倍。我们还表明，Pluto +不会显着增加编译时间。在Polybench上进行的实验结果表明，Pluto +仅将整体多面体源间优化时间增加了15％。在显着缩短执行时间的情况下，它仅将多面体优化时间增加了2.04倍。

著录项

来源
《ACM Transactions on Programming Languages and Systems》 |2016年第3期|12.1-12.32|共32页
作者
Bondhugula Uday; Acharya Aravind; Cohen Albert;
展开▼
作者单位

Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India;

Indian Inst Sci, Dept Comp Sci & Automat, Bangalore 560012, Karnataka, India;

INRIA, 45 Rue Ulm, F-75005 Paris, France|ENS, 45 Rue Ulm, F-75005 Paris, France;

展开▼
收录信息美国《科学引文索引》(SCI);
原文格式 PDF
正文语种 eng
中图分类
关键词
Algorithms; Design; Experimentation; Performance; Automatic parallelization; locality optimization; polyhedral model; loop transformations; affine transformations; tiling;

机译：算法;设计;实验;性能;自动并行化;局部性优化;多面体模型;循环变换;仿射变换;平铺;

相似文献

外文文献
中文文献
专利

1. Nested-Loops Tiling for Parallelization and Locality Optimization [J] . Parsa Saeed, Hamzei Mohammad Computing and informatics . 2017,第3期

机译：嵌套循环平铺用于并行化和位置优化
2. NESTED-LOOPS TILING FOR PARALLELIZATION AND LOCALITY OPTIMIZATION [J] . Parsa Saeed, Hamzei Mohammad Computing and informatics . 2017,第3期

机译：嵌套圈平铺化以实现本地化和本地化优化
3. Algorithmic Species: A Classification of Affine Loop Nests for Parallel Programming [J] . CEDRIC NUGTEREN, PIETER CUSTERS, HENK CORPORAAL ACM Transactions on Architecture and Code Optimization . 2012,第4期

机译：算法种类：仿射循环嵌套的并行编程分类
4. A Compiler Driven Out-of-Core Programming Approach for Optimizing Data Locality in Loop Nests [C] . Wendy Zhang, Ernst L. Leiss International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'2001) Vol.1, Jun 25-28, 2001, Las Vegas, Nevada, USA . 2001

机译：一种编译器驱动的核外编程方法，用于优化循环嵌套中的数据局部性
5. Tools for performance optimizations and tuning of affine loop nests. [D] . Hartono, Albert. 2010

机译：用于性能优化和仿射循环嵌套调整的工具。
6. Parallel tiled Nussinov RNA folding loop nest generated using both dependence graph transitive closure and loop skewing [O] . Marek Palkowski, Wlodzimierz Bielecki 2017

机译：并行平铺的Nussinov RNA折叠环巢使用依赖图传递闭合和循环倾斜生成
7. PVL: Parallelization and Vectorization of Affine Perfectly Nested-Loops Considering Data Locality on Short-Vector Multicore Processors using Intrinsic Vectorization [O] . Yousef Seyfari, Shahriar Lotfi, Jaber Karimpour 2017

机译：PVL：通过内在矢量化考虑短矢量多核处理器上的数据通道，PVL：仿射和矢量化完美嵌套循环

The Pluto plus Algorithm: A Practical Approach for Parallelization and Locality Optimization of Affine Loop Nests

摘要

著录项

相似文献

相关主题

期刊订阅