Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

机译：面向多GPGPU的DOACROSS并行化

获取原文

页面导航

摘要
著录项
相似文献
相关主题

摘要

To exploit the full potential of GPGPUs for general purpose computing, DOACR parallelism abundant in scientific and engineering applications must be harnessed. However, the presence of cross-iteration data dependences in DOACR loops poses an obstacle to execute their computations concurrently using a massive number of fine-grained threads. This work focuses on iterative PDE solvers rich in DOACR parallelism to identify optimization principles and strategies that allow their efficient mapping to GPGPUs. Our main finding is that certain DOACR loops can be accelerated further on GPGPUs if they are algorithmically restructured (by a domain expert) to be more amendable to GPGPU parallelization, judiciously optimized (by the compiler), and carefully tuned by a performance-tuning tool. We substantiate this finding with a case study by presenting a new parallel SSOR method that admits more efficient data-parallel SIMD execution than red-black SOR on GPGPUs. Our solution is obtained non-conventionally, by starting from a K-layer SSOR method and then parallelizing it by applying a non-dependence-preserving scheme consisting of a new domain decomposition technique followed by a generalized loop tiling. Despite its relatively slower convergence, our new method outperforms red-black SOR by making a better balance between data reuse and parallelism and by trading off convergence rate for SIMD parallelism. Our experimental results highlight the importance of synergy between domain experts, compiler optimizations and performance tuning in maximizing the performance of applications, particularly PDE-based DOACR loops, on GPGPUs.

机译：为了充分发挥GPGPU在通用计算方面的潜力，必须利用在科学和工程应用中丰富的DOACR并行性。但是，DOACR循环中存在交叉迭代数据相关性，这给使用大量细粒度线程同时执行其计算构成了障碍。这项工作集中在具有DOACR并行性的迭代PDE求解器上，以识别允许其有效映射到GPGPU的优化原理和策略。我们的主要发现是，如果将某些DOACR循环（通过领域专家）进行算法重组（由领域专家）以使其更适合GPGPU并行化，明智地优化（由编译器）并通过性能调整工具进行仔细调整，则可以在GPGPU上进一步加速。。我们通过提供一个新的并行SSOR方法，通过一个案例研究来证实这一发现，该方法比GPGPU上的红黑SOR允许更有效的数据并行SIMD执行。我们的解决方案是通过从K层SSOR方法开始，然后通过应用由新域分解技术和广义循环平铺组成的不依赖于保留方案并行化来非常规地获得的。尽管收敛速度相对较慢，但我们的新方法通过在数据重用性和并行性之间取得更好的平衡，并通过权衡SIMD并行性的收敛速度，胜过了红黑SOR。我们的实验结果强调了领域专家，编译器优化和性能调整之间的协同作用对于最大化应用程序（尤其是GPGPU上基于PDE的DOACR循环）的性能的重要性。

著录项

来源
《The 39th International Conference on Parallel Processing》|2010年|P.40-50|共11页
会议地点
作者
Di Peng; Wan Qing; Zhang Xuemeng; Wu Hui; Xue Jingling;
展开▼
作者单位

展开▼
会议组织
原文格式 PDF
正文语种
中图分类并行计算机;
关键词
DOACR Parallelism; GPGPU; Loop Tiling; SOR;

机译：DOACR并行; GPGPU;循环切片; SOR;

相似文献

外文文献
中文文献
专利

1. Harnessing aspect-oriented programming on GPU: application to warp-level parallelism [J] . Jonathan Passerat-Palmbach, Jonathan Caux, Pierre Schweitzer, International Journal of Computer Aided Engineering and Technology . 2015,第2期

机译：在GPU上利用面向方面的编程：在扭曲级并行中的应用
2. Harnessing parallelism in multicore clusters with the All-Pairs, Wavefront, and Makeflow abstractions [J] . Yu L., Moretti C., Thrasher A., Cluster computing . 2010,第3期

机译：利用All-Pair，Wavefront和Makeflow抽象在多核群集中利用并行性
3. Harnessing the Multicores: Nested Data Parallelism in Haskell [J] . Simon Peyton Jones, Roman Leshchinskiy, Gabriele Keller, LIPIcs : Leibniz International Proceedings in Informatics . 2008,第2期

机译：利用多核：Haskell中的嵌套数据并行性
4. Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs [C] . Di Peng, Wan Qing, Zhang Xuemeng, International Conference on Parallel Processing . 2010

机译：朝着利用多GPGPUS的DoacrossParpastication
5. Harnessing Multicore Parallelism for High Performance Data Replication [D] . Li, Tan. 2015

机译：利用多核并行性以获得高性能数据复制
6. Pavlik Harness Disease Revisited: Does Prolonged Treatment of a Dislocated Hip in a Harness Adversely Affect the α Angle? [O] . Alex L. Gornitzky, Emily K. Schaeffer, Charles T. Price, -1

机译：再谈Pavlik线束疾病：线束中髋关节脱位的长期治疗是否会对α角产生不利影响？
7. Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs [O] . Peng Di, Qing Wan, Xuemeng Zhang, 2010

机译：为多GpGpU利用并行机制实现DOaCROss
8. Algorithms to Harness Massive Parallelism. [R] . Wittie, L. D. 1989

机译：利用大规模并行机制的算法。

Toward Harnessing DOACROSS Parallelism for Multi-GPGPUs

摘要

著录项

相似文献

相关主题

期刊订阅